RDFa Templating

Kjetil Kjernsmo <kjetil@kjernsmo.net>

Status of this document

This is at present a braindump of an idea I had to extend RDFa to use as a templating language for building web pages. It is not by any means finished but does have a partial implementation in the Perl distribution RDF::RDFa::Template. First and foremost, this document and the module serves as a starting point for further discussion.

Introduction

RDFa is a W3C Recommendation for embedding RDF in XHTML. Since RDF represents structured data, we can utilise it to represent both invariant data and variables. The invariant data can be used to control a backend web application. By connecting invariant data and variables, data we wish to use to populate a document can be retrieved. This is the motivation for a new templating language, RDFa Templates.

We note that RDF and RDFa alike can contain an XML Literal, which is a balanced XML fragment. This makes it possibly to legally encode XML fragments into the RDF model, that can represent e.g. variables. Thus, by parsing the XHTML, an RDF Model can be built. By parsing any XML Literals of a specified namespace, special constructs, such as SPARQL variables, can be found, thus creating a Basic Graph Pattern. There is therefore a design goal of RDFa Templates that it should be parseable with standard XML, RDFa and RDF tools with very few extensions. It is not a goal to make the templating language be RDFa, it is more important that template writers feel at home with current their HTML/XML-centric world.

Additionally, the document will be divided into parts by wrapping the part in a graph element. Each section is an RDF graph and it is planned that the author will have different ways to name this graph, but in the example implementation it is very limited, see below. Configuration information can be supplied as attributes to the graph element.

Two modes of operation is envisioned:

The RDF graph is used to create the WHERE clause of a SPARQL query. XML Literals are parsed to find the SPARQL variables as well as other constructs, such as FILTERs. Configuration information, such as the URL of a SPARQL Endpoint is retrieved from the graph element. The query is then executed and variables are populated by iterating over the results.
The RDFa Template is used in a Model-View-Controller paradigm framework. The RDF model is submitted to a Controller, which is free to process it as it sees fit, and the corresponding View should return the variables that the template asks for.

Examples

Retrieve a single record

The concept can be best explained by example. Consider the following extended XHTML+RDFa document:

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:g="http://example.org/graph#"
      xmlns:rat="http://www.kjetil.kjernsmo.net/software/rat/xmlns"
      xmlns:sub="http://www.kjetil.kjernsmo.net/software/rat/substitutions#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dbp="http://dbpedia.org/property/"
      rat:doctype-public="-//W3C//DTD XHTML+RDFa 1.0//EN"
      rat:doctype-system="http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"
      xml:lang="en">
  <head>
    <title>Range of a Cessna Citation Mustang</title>
  </head>
  <body>
    <h1>Range of a Cessna Citation Mustang</h1>
    <rat:graph g:graph="query1" endpoint="http://dbpedia.org/sparql">
      <div about="sub:resource">
	<div property="rdfs:label">Cessna Citation Mustang</div>
	<div property="dbp:rangeAlt" datatype="rdf:XMLLiteral"><rat:variable name="sub:range"/></div>
      </div>
    </rat:graph>
  </body>
</html>

Let us walk through the example: It first defines the needed namespace prefixes, and sets the document language. Two namespaces are defined for use by RDFa Templates: Here, the namespace assigned to the rat:-prefix is used for the XML elements and attributes used inline in the XHTML document, whereas sub: is used to prefix the document author's variables, so that each variable is denoted by a URI. It has also two attributes to set the document type of the result document. In the current implementation, these prefixes are hardcoded in the code, but this will change in upcoming releases.

The rat:graph element delimits a named graph. It can be understood as a unit that may result in a single query in the first usage scenario above. Let us in the following consider only this case, the processing in the second usage scenario is very much up to the backend.

The g:graph attribute causes a named graph to be created, using the base URI of the document and the attribute content as a fragment identifier. All triples that has a predicate within this element will be added to this graph. It is important to understand the the named graph is used to refer to the graph in the context of the document, but it is not necessarily used in any generated SPARQL query. In the current implementation, the namespace URI and qname of this attribute is hardcoded, so it must be used in any experiments, but this will change in a full implementation of the ideas in this document.

The element then has an optional endpoint attribute, which contains the URL of the SPARQL endpoint that the query that stems from the triples of that particular graph will be run against.

Finally, the triples that make out the query. This examples says "give me the URI and the rdfs:comment where the rdfs:label is 'Resource Description Framework' written in English".

Specifically, this SPARQL query will be generated:

SELECT * WHERE { 
  ?resource <http://dbpedia.org/property/rangeAlt> ?range .
  ?resource <http://www.w3.org/2000/01/rdf-schema#label> "Cessna Citation Mustang"@en . 
}

We note that in the original file "sub:resource" was the subject set by the about attribute, and this has resulted in the ?resource variable being the subject in the SPARQL WHERE clause. The two property attributes sets the predicates. The object of the "rdfs:label" predicate is a plain literal. Plain literals are used to set the object to a specified value to be searched for.

An XML Literal, when in the rat:-prefixed namespace, has a special meaning. In this case, we see that the rat:variable element is used to create the ?comment variable, which appears in the object of rdfs:comment.

Retrieve multiple records

Often, you want to retrieve multiple records, and a processor should iterate over all query solutions. Consider this example:

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:rat="http://www.kjetil.kjernsmo.net/software/rat/xmlns"
      xmlns:sub="http://www.kjetil.kjernsmo.net/software/rat/substitutions#"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:dbp="http://dbpedia.org/property/"
      xmlns:dbo="http://dbpedia.org/ontology/"
      rat:doctype-public="-//W3C//DTD XHTML+RDFa 1.0//EN"
      rat:doctype-system="http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"
      xml:lang="en">
  <head>
  </head>
  <body>
    <h1>Aircraft produced in 1973</h1>
    <table>
      <tr><th>Name</th><th>Produced</th><th>First flight</th></tr>
      <rat:graph g:graph="query2" endpoint="http://dbpedia.org/sparql">
	<tr about="sub:resource">
	  <td property="foaf:name" datatype="rdf:XMLLiteral"><rat:variable name="sub:name"/></td>
	  <td property="dbo:produced">1973</td>
	  <td property="dbp:firstFlight" datatype="rdf:XMLLiteral"><rat:variable name="sub:first"/></td>
	</tr>
      </rat:graph>
    </table>
  </body>
</html>

In this case, we will get more than one result as the query has more than one solution. The processor should in this case iterate over the elements in the rat:graph element and insert the values for the corresponding row on each iteration, thus producing the following XHTML+RDFa:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:dbp="http://dbpedia.org/property/"
      xmlns:dbo="http://dbpedia.org/ontology/"
      xml:lang="en">
  <head>
  </head>
  <body>
    <h1>Aircraft produced in 1973</h1>
    <table>
      <tr><th>Name</th><th>Produced</th><th>First flight</th></tr>  
      <tr about="http://dbpedia.org/resource/Dassault/Dornier_Alpha_Jet">
	<td property="foaf:name">Alpha Jet</td>
	<td property="dbo:produced">1973</td>
	<td property="dbp:firstFlight">1973-10-26</td>
      </tr>
      <tr about="http://dbpedia.org/resource/Piper_PA-36_Pawnee_Brave">
	<td property="foaf:name">PA-36 Pawnee Brave</td>
	<td property="dbo:produced">1973</td>
	<td property="dbp:firstFlight">1969</td>
      </tr>    
    </table>
  </body>
</html>

This is not implemented yet and the query in this example does not anymore return any results.

We saw that by having an invariant property dbo:produced, a SPARQL query was generated where this literal resulted in a query that would find everything first produced in 1973. The sub:-prefixed URIs in the input example was used to create three variables, ?resource, ?name and ?first, which was used to populate the template.

Scope

The scope of RDFa Templates is an open question. The author currently feels that only issues relevant to RDF and SPARQL is the scope of RDFa Templates. There are many well established and widespread templating languages for writing web pages, e.g. in PHP, Template Toolkit, etc. Also, for advanced transforms, XSLT is a useful and widespread tool. XSLT is ill suited for working on RDF graphs, so RDFa Templates and XSLT is a complement to each other.

Therefore, the author envisions RDFa Templates as a part of a greater system that employs several techniques to build web pages. One may for example use Template Toolkit to build the page, insert query variables, etc., and then use RDFa Templates to populate the page with data from a backend.

This is, as stated, and open question, and the author is open to other views.

Weaknesses

Currently, all variables are denoted by a specified namespace URI. This causes that the same URI is used in different applications for different variables. Some way for application authors to choose this prefix, yet retain portability should be found.

TODO

This is an early draft, and there are a number of issues with this draft, as well as the implementation. The TODO list for the software is in the distribution, this is a list of issues with this document:

Resolve weaknesses, or decide that they are really not important.
How will URIs appear in objects?
How to use FILTERs? (use XSPARQL?)
How to use freetext index matching?
How to create SPARQL UNIONs and OPTIONALs?
Other ways to assign graph names?

Acknowledgements

First and foremost a thank-you to Toby Inkster for writing the RDF::RDFa::Parser, and for several suggestions in the early thinking around this idea.