XPath location steps

Each XPath expression (or XPath query) is evaluated at an XML tree node (called the context node) and returns an object of one of the four types: Boolean, number, string, node set (not surprisingly, a set of nodes). The most important XPath expression is called (location) path, whose value is always a node set. A path is composed of (location) steps. Each step has the form axis::test[filter]. The part [filter] is optional.

We will learn XPath by example taking advantage of the following alphabet tree that corresponds to the alphabet XML document:

A tree representing the English alphabet if read in preorder

The document order is a total order defined on the XML elements of a document. An element A precedes an element B in the document order if the starting tag of A comes first than the starting tag of B reading the document from top to bottom. Notice that the document order corresponds to the preorder on the XML tree. For instance, if you read the alphabet document in document order, or the alphabet tree in preorder, you get the English alphabet in descending order.

Here is the first example of location path (in fact, a location step): child::*. The result of this expression is the set of element child nodes of the context node regardless of names (tags). Here is a graphical example: the context node is the red node and the result node set contains the yellow nodes:

The step parent::* retrieves the element parent node of the context node. A graphical example follows (the context node is always the red node and the result node set always contains the yellow nodes):

The step descendant::* retrieves the element descendant nodes of the context node. A graphical example follows:

The step ancestor::* retrieves the element ancestor nodes of the context node. A graphical example follows:

The step following-sibling::* retrieves the element right sibling nodes of the context node. A graphical example follows:

The effect of the following-sibling axis

The step preceding-sibling::* retrieves the element left sibling nodes of the context node. A graphical example follows:

The effect of the preceding-sibling axis

The step following::* retrieves the element nodes that follow the context node with respect to the document order, excluding the descendant nodes. A graphical example follows:

The step preceding::* retrieves the element nodes that precede the context node with respect to the document order, excluding the ancestor nodes. A graphical example follows:

Moreover, the step descendant-or-self::* retrieves the element descendant nodes of the context node plus the context node itself, ancestor-or-self::* retrieves the element ancestor nodes of the context node plus the context node itself, self::* retrieves the context node itself, attribute::* retrieves the attribute nodes of the context node, and finally namespace::* retrieves the namespace nodes of the context node.

So far we have used only the * test that matches any element nodes regardless of name. Other relevant tests include:

name: Along all axes but attribute and namespace, it matches all elements with the specified tag. Along the attribute axis, it matches all attributes with the given name. Along the namespace axis, it matches all namespaces with the given prefix.
node(): It matches all nodes regardless of type. However, notice that child::node() selects all children of the context node but the attributes and namespaces, since, according to XPath data model, they are not children of the context node. To select all attributes of the context node, use attribute::*, or namespace::* for namespaces.
text(): It matches all text nodes.
comment(): It matches all comment nodes.
processing-instruction(): It matches all processing instruction nodes. With a string as argument, it selects all processing instructions with that target.

For instance, consider again the following example:

<?xml version="1.0"?>
<!DOCTYPE person SYSTEM "Turing.dtd">
<?xml-stylesheet type="text/css" href="Turing.css"?>
<!-- Alan Turing was the first computer scientist -->
<person born="23/06/1912" died="07/06/1954">
   <name>
      <first>Alan</first>
      <last>Turing</last>
   </name>
   <profession>computer scientist</profession> 
   As a computer scientist, he is best-known for the Turing Test 
   and the Turing Matchine.
   <profession>mathematician</profession>
   <profession>cryptographer</profession>
</person>
<!-- He committed suicide on June 7, 1954. -->

Let us set the context node to the person node. Then child::node() matches all child nodes, that is, the name and profession element child nodes and the only text child node, child::* matches only element child nodes, child::profession matches only profession element child nodes, child::text() matches the text node, attribute::* matches both the attributes, and attribute::born matches the born attribute. Moreover, if the context node is the root, then child::comment() selects the two comments of the document element, and child::processing-instruction('xml-stylesheet') selects the stylesheet processing instruction.

The last component of a location step is the optional filter. Since filters may contain location paths, we postpone their introduction after the discussion of location paths.