RDFa Templating

Kjetil Kjernsmo <kjetil@kjernsmo.net>

Status of this document

This is at present a braindump of an idea I had to extend RDFa to use as a templating language for building web pages. It is not by any means finished or implemented, but it is a starting point for further discussion as to how such a thing should work.

Introduction

RDFa is a W3C Recommendation for embedding RDF in XHTML. Since RDF represents structured data, we can utilise it to represent both invariant data and variables. The invariant data can be used to control a backend web application. By connecting invariant data and variables, data we wish to use to populate a document can be retrieved. This is the motivation for a new templating language, RDFa Templates.

We note that RDF and RDFa alike can contain an XML Literal, which is a balanced XML fragment. This makes it possibly to legally encode XML fragments into the RDF model, that can represent e.g. variables. Thus, by parsing the XHTML, the RDF model can be built. By parsing any XML Literals of a specified namespace, special constructs, such as SPARQL variables, can be found. There is therefore a design goal of RDFa Templates that it should be parseable with standard XML, RDFa and RDF tools with very few extensions.

Additionally, the document will be divided into parts by wrapping the part in a graph element. Each section is an RDF graph, which is identified by a xml:id attribute. Configuration information can be supplied as attributes to the graph element. A processor will also use the xml:id attribute to name this graph.

Two modes of operation is envisioned:

  1. The RDF graph is used to create the WHERE clause of a SPARQL query. XML Literals are parsed to find the SPARQL variables as well as other constructs, such as FILTERs. Configuration information, such as the URL of a SPARQL Endpoint is retrieved from the graph element. The query is then executed and variables are populated by iterating over the results.
  2. The RDFa Template is used in a Model-View-Controller paradigm framework. The RDF model is submitted to a Controller, which is free to process it as it sees fit, and the corresponding View should return the variables that the template asks for.

Examples

Retrieve a single record

The concept can be best explained by example. Consider the following extended XHTML+RDFa document:

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:rat="http://www.kjetil.kjernsmo.net/software/rat/xmlns"
      xmlns:sub="http://www.kjetil.kjernsmo.net/software/rat/substitutions#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      rat:doctype-public="-//W3C//DTD XHTML+RDFa 1.0//EN"
      rat:doctype-system="http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"
      xml:lang="en">
  <head>
  </head>
  <body>
    <rat:graph xml:id="query1" endpoint="http://dbpedia.org/sparql">
      <div about="sub:resource">
	<div property="rdfs:label">Resource Description Framework</div>
	<div property="rdfs:comment" datatype="rdf:XMLLiteral"><rat:variable name="sub:comment"/></div>
      </div>
    </rat:graph>
  </body>
</html>

Let us walk through the example: It first defines the needed namespace prefixes, and sets the document language. Two namespaces are defined for use by RDFa Templates: Here, the namespace assigned to the rat:-prefix is used for the XML elements and attributes used inline in the XHTML document, whereas sub: is used to prefix the document author's variables, so that each variable is denoted by a URI. It has also two attributes to set the document type of the result document. The head element is irrelevant to the example and therefore left empty for brevity.

The rat:graph element delimits a named graph. It can be understood as a unit that may result in a single query in the first usage scenario above. Let us in the following consider only this case, the processing in the second usage scenario is very much up to the backend.

The xml:id attribute causes a named graph to be created, using the base URI of the document and the attribute content as a fragment identifier. All triples that has a predicate within this element will be added to this graph. It is important to understand the the named graph is used to refer to the graph in the context of the document, but it is not necessarily used in any generated SPARQL query.

The element then has an optional endpoint attribute, which contains the URL of the SPARQL endpoint that the query that stems from the triples of that particular graph will be run against.

Finally, the triples that make out the query. This examples says "give me the URI and the rdfs:comment where the rdfs:label is 'Resource Description Framework' written in English".

Specifically, this SPARQL query will be generated:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?resource ?comment 
WHERE {
	?resource rdfs:label "Resource Description Framework"@en ;
		  rdfs:comment ?comment . 
}

Lets look more carefully into what happened here: First, the namespace declarations of the original document were added to the query, except the XHTML namespace and the two RDFa Template-specific namespaces, since they are known to not influence the generation of the query. Then, the two variables declared in the sub:-prefixed namespace were added to the SELECT clause.

Then, we note that in the original file "sub:resource" was the subject set by the about attribute, and this has resulted in the ?resource variable being the subject in the SPARQL WHERE clause. The two property attributes sets the predicates. The object of the "rdfs:label" predicate is a plain literal. Plain literals are used to set the object to a specified value to be searched for.

An XML Literal, when in the rat:-prefixed namespace, has a special meaning. In this case, we see that the rat:variable element is used to create the ?comment variable, which appears in the object of rdfs:comment.

Retrieve multiple records

Often, you want to retrieve multiple records, and a processor should iterate over all query solutions. Consider this example:

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:rat="http://www.kjetil.kjernsmo.net/software/rat/xmlns"
      xmlns:sub="http://www.kjetil.kjernsmo.net/software/rat/substitutions#"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:dbp="http://dbpedia.org/property/"
      xmlns:dbo="http://dbpedia.org/ontology/"
      rat:doctype-public="-//W3C//DTD XHTML+RDFa 1.0//EN"
      rat:doctype-system="http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"
      xml:lang="en">
  <head>
  </head>
  <body>
    <h1>Aircraft produced in 1973</h1>
    <table>
      <tr><th>Name</th><th>Produced</th><th>First flight</th></tr>
      <rat:graph xml:id="query2" endpoint="http://dbpedia.org/sparql">
	<tr about="sub:resource">
	  <td property="foaf:name" datatype="rdf:XMLLiteral"><rat:variable name="sub:name"/></td>
	  <td property="dbo:produced">1973</td>
	  <td property="dbp:firstFlight" datatype="rdf:XMLLiteral"><rat:variable name="sub:first"/></td>
	</tr>
      </rat:graph>
    </table>
  </body>
</html>

In this case, we will get more than one result as the query has more than one solution. The processor should in this case iterate over the elements in the rat:graph element and insert the values for the corresponding row on each iteration, thus producing the following XHTML+RDFa:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:dbp="http://dbpedia.org/property/"
      xmlns:dbo="http://dbpedia.org/ontology/"
      xml:lang="en">
  <head>
  </head>
  <body>
    <h1>Aircraft produced in 1973</h1>
    <table>
      <tr><th>Name</th><th>Produced</th><th>First flight</th></tr>  
      <tr about="http://dbpedia.org/resource/Dassault/Dornier_Alpha_Jet">
	<td property="foaf:name">Alpha Jet</td>
	<td property="dbo:produced">1973</td>
	<td property="dbp:firstFlight">1973-10-26</td>
      </tr>
      <tr about="http://dbpedia.org/resource/Piper_PA-36_Pawnee_Brave">
	<td property="foaf:name">PA-36 Pawnee Brave</td>
	<td property="dbo:produced">1973</td>
	<td property="dbp:firstFlight">1969</td>
      </tr>    
    </table>
  </body>
</html>

We saw that by having an invariant property dbo:produced, a SPARQL query was generated where this literal resulted in a query that would find everything first produced in 1973 (apparently, this property is only used for aircraft in DBPedia). The sub:-prefixed URIs in the input example was used to create three variables, ?resource, ?name and ?first, which was used to populate the template.

Scope

The scope of RDFa Templates is an open question. The author currently feels that only issues relevant to RDF and SPARQL is the scope of RDFa Templates. There are many well established and widespread templating languages for writing web pages, e.g. in PHP, Template Toolkit, etc. Also, for advanced transforms, XSLT is a useful and widespread tool. XSLT is ill suited for working on RDF graphs, so RDFa Templates and XSLT is a complement to each other.

Therefore, the author envisions RDFa Templates as a part of a greater system that employs several techniques to build web pages. One may for example use Template Toolkit to build the page, insert query variables, etc., and then use RDFa Templates to populate the page with data from a backend.

This is, as stated, and open question, and the author is open to other views.

Weaknesses

Currently, all variables are denoted by a specified namespace URI. This causes that the same URI is used in different applications for different variables. Some way for application authors to choose this prefix, yet retain portability should be found.

TODO

Acknowledgements

First and foremost a thank-you to Toby Inkster for writing the RDF::RDFa::Parser, and for several suggestions in the early thinking around this idea.