XML path language (XPath) is a simple language to retrieve XML elements from a single XML document. XPath can be exploited in different XML technologies: by itself as a simple query language for XML, in XQuery to retrieve XML elements that may be further processed in order to solve a query, in XSLT to retrieve the elements to which template rules are applied in order to transform an XML document, in W3C XML Schema to locate keys and key references, and finally in XPointer to point to particular XML elements in the linked XML document.
In order to understand XPath semantics it is necessary to understand XPath data model, that is, how XPath views an XML document. The XPath data model represents each XML document as a tree of nodes. Each node has one of these types:
In particular, the XPath data model does to consider the XML and DTD declarations. Moreover, all entity and character references, and CDATA sections are resolved before the XML tree is built. As an example, consider the following XML document:
<?xml version="1.0"?> <!DOCTYPE person SYSTEM "Turing.dtd"> <?xml-stylesheet type="text/css" href="Turing.css"?> <!-- Alan Turing was the first computer scientist --> <person born="23/06/1912" died="07/06/1954"> <name> <first>Alan</first> <last>Turing</last> </name> <profession>computer scientist</profession> As a computer scientist, he is best-known for the Turing Test and the Turing Matchine. <profession>mathematician</profession> <profession>cryptographer</profession> </person> <!-- He committed suicide on June 7, 1954. -->
For the sake of simplicity, in this example, we ignore text nodes whose values are strings of spaces. The root node for this document has two comment children corresponding to the two comments of the document, one processing instruction child for the stylesheet processing instruction, and one element child for the person element. The is no child associated to the XML and DTD declarations. The person element has as parent the root node, and it has 5 children in this order: the name element node, the first profession element node, the text node with value all the text between the first and the second profession elements, and the last two profession element children. The attribute nodes born and died are not children of the person node. However, the person node is the parent of them. The name element has as parent the person element node and as children the first and last element nodes. Its string value is AlanTuring. The parent of the first element is the name element node. It has a text child with value Alan.