Each XPath expression (or XPath query) is evaluated at an XML tree node (called the context node) and returns an object of one of the four types: Boolean, number, string, node set (not surprisingly, a set of nodes). The most important XPath expression is called (location) path, whose value is always a node set. A path is composed of (location) steps. Each step has the form axis::test[filter]. The part [filter] is optional.
We will learn XPath by example taking advantage of the following alphabet tree that corresponds to the alphabet XML document:
The document order is a total order defined on the XML elements of a document. An element A precedes an element B in the document order if the starting tag of A comes first than the starting tag of B reading the document from top to bottom. Notice that the document order corresponds to the preorder on the XML tree. For instance, if you read the alphabet document in document order, or the alphabet tree in preorder, you get the English alphabet in descending order.
Here is the first example of location path (in fact, a location step): child::*. The result of this expression is the set of element child nodes of the context node regardless of names (tags). Here is a graphical example: the context node is the red node and the result node set contains the yellow nodes:
The step parent::* retrieves the element parent node of the context node. A graphical example follows (the context node is always the red node and the result node set always contains the yellow nodes):
The step descendant::* retrieves the element descendant nodes of the context node. A graphical example follows:
The step ancestor::* retrieves the element ancestor nodes of the context node. A graphical example follows:
The step following-sibling::* retrieves the element right sibling nodes of the context node. A graphical example follows:
The step preceding-sibling::* retrieves the element left sibling nodes of the context node. A graphical example follows:
The step following::* retrieves the element nodes that follow the context node with respect to the document order, excluding the descendant nodes. A graphical example follows:
The step preceding::* retrieves the element nodes that precede the context node with respect to the document order, excluding the ancestor nodes. A graphical example follows:
Moreover, the step descendant-or-self::* retrieves the element descendant nodes of the context node plus the context node itself, ancestor-or-self::* retrieves the element ancestor nodes of the context node plus the context node itself, self::* retrieves the context node itself, attribute::* retrieves the attribute nodes of the context node, and finally namespace::* retrieves the namespace nodes of the context node.
So far we have used only the * test that matches any element nodes regardless of name. Other relevant tests include:
For instance, consider again the following example:
<?xml version="1.0"?> <!DOCTYPE person SYSTEM "Turing.dtd"> <?xml-stylesheet type="text/css" href="Turing.css"?> <!-- Alan Turing was the first computer scientist --> <person born="23/06/1912" died="07/06/1954"> <name> <first>Alan</first> <last>Turing</last> </name> <profession>computer scientist</profession> As a computer scientist, he is best-known for the Turing Test and the Turing Matchine. <profession>mathematician</profession> <profession>cryptographer</profession> </person> <!-- He committed suicide on June 7, 1954. -->
Let us set the context node to the person node. Then child::node() matches all child nodes, that is, the name and profession element child nodes and the only text child node, child::* matches only element child nodes, child::profession matches only profession element child nodes, child::text() matches the text node, attribute::* matches both the attributes, and attribute::born matches the born attribute. Moreover, if the context node is the root, then child::comment() selects the two comments of the document element, and child::processing-instruction('xml-stylesheet') selects the stylesheet processing instruction.
The last component of a location step is the optional filter. Since filters may contain location paths, we postpone their introduction after the discussion of location paths.