Next page Previous page Start of chapter End of chapter

XML Path Language

XML path language (XPath) is a simple language to retrieve XML elements from a single XML document. XPath can be exploited in different XML technologies: by itself as a simple query language for XML, in XQuery to retrieve XML elements that may be further processed in order to solve a query, in XSLT to retrieve the elements to which template rules are applied in order to transform an XML document, in W3C XML Schema to locate keys and key references, and finally in XPointer to point to particular XML elements in the linked XML document.

In order to understand XPath semantics it is necessary to understand XPath data model, that is, how XPath views an XML document. The XPath data model represents each XML document as a tree of nodes. Each node has one of these types:

Root
The root node is a virtual node that does not correspond to any component in the XML document. It has one comment child for each comment outside the document element, one processing instruction child for each processing instruction outside the document element, and a unique element child that corresponds to the document element. The root node has no parent. Its string value is the string value of the document element node.
Element
An element node corresponds to an XML element and is labelled with the XML element tag. It always has a parent (either the root node or an element node) and it may have children of type element, comment, processing instruction, and text. Attributes and namespaces associated to the element are not children of the element node. The string value of an element node is the text contained between the start and end tags of the element, excluding all tags, comments, and processing instructions.
Attribute
An attribute node corresponds to an attribute. Its parent is the element node that contains the attribute (however the attribute is not a child of its parent!). The string value of an attribute node is the attribute value.
Namespace
A namespace node represents a namespace. Like attribute nodes, namespace nodes have a parent but are not children of their parent. The string value of a namespace node is the namespace URI.
Text
A text node represents a maximal string of text between tags, comments, and processing instructions. It has a parent node but no child. Its string value is the text of the node.
Processing instruction
A processing instruction node represents a processing instruction. It has a parent and no children. The string value of a processing instruction node is the content of the instruction excluding the target.
Comment
A comment node represents a comment. It has a parent and no children. The string value of a comment node is the content of the comment.

In particular, the XPath data model does to consider the XML and DTD declarations. Moreover, all entity and character references, and CDATA sections are resolved before the XML tree is built. As an example, consider the following XML document:

<?xml version="1.0"?>
<!DOCTYPE person SYSTEM "Turing.dtd">
<?xml-stylesheet type="text/css" href="Turing.css"?>
<!-- Alan Turing was the first computer scientist -->
<person born="23/06/1912" died="07/06/1954">
   <name>
      <first>Alan</first>
      <last>Turing</last>
   </name>
   <profession>computer scientist</profession> 
   As a computer scientist, he is best-known for the Turing Test 
   and the Turing Matchine.
   <profession>mathematician</profession>
   <profession>cryptographer</profession>
</person>
<!-- He committed suicide on June 7, 1954. -->

For the sake of simplicity, in this example, we ignore text nodes whose values are strings of spaces. The root node for this document has two comment children corresponding to the two comments of the document, one processing instruction child for the stylesheet processing instruction, and one element child for the person element. The is no child associated to the XML and DTD declarations. The person element has as parent the root node, and it has 5 children in this order: the name element node, the first profession element node, the text node with value all the text between the first and the second profession elements, and the last two profession element children. The attribute nodes born and died are not children of the person node. However, the person node is the parent of them. The name element has as parent the person element node and as children the first and last element nodes. Its string value is AlanTuring. The parent of the first element is the name element node. It has a text child with value Alan.

Next page Previous page Start of chapter End of chapter
Caffè XML - Massimo Franceschet