XPath operators and functions

The fragment of XPath that we described so far is usually called navigational or core XPath. It essentially provides all features to navigate the XML tree. However, full XPath offers more, including:

Comparison operators, like =, !=, <, <=, >, >=;
Number operators, like +, -, *, div (division), mod (remainder);
functions.

Consider the following XML document containing an excerpt of a bibliography:

<?xml version="1.0"?>
<!DOCTYPE biblio SYSTEM "biblio.dtd">

<biblio>
   <inproceedings key="M4M05" cite="JLLI05">
      <author>M. Franceschet</author>
      <author>E. Zimuel</author>
      <title>Modal logic and navigational XPath: an experimental comparison</title>
      <booktitle>Workshop Methods for Modalities</booktitle>
      <pages>156-172</pages>
      <year>2005</year>
      <url>http://www.sci.unich.it/~francesc/pubs/m4m05.pdf</url>
   </inproceedings>
   
   <article key="JLLI05" cite="M4M05">
      <author>M. Franceschet</author>
      <author>B. ten Cate</author>
      <title>Guarded fragments with constants</title>
      <journal>Journal of Logic, Language and Information</journal>
      <volume>14</volume>
      <number>3</number>
      <pages>281-288</pages>
      <year>2005</year>
      <url>http://www.sci.unich.it/~francesc/pubs/jlli05.pdf</url>
      <price>15</price>
   </article>
</biblio>

A simple DTD for the above document follows:

<!ELEMENT biblio (article | inproceedings)*>

<!ELEMENT article (author+,title,journal,volume,number,pages,year,url,price)>
<!ATTLIST article key ID #REQUIRED
                  cite IDREFS #IMPLIED>

<!ELEMENT inproceedings (author+,title,booktitle,pages,year,url)>
<!ATTLIST inproceedings key ID #REQUIRED
                        cite IDREFS #IMPLIED>

<!ELEMENT author    (#PCDATA)>
<!ELEMENT title     (#PCDATA)>
<!ELEMENT booktitle (#PCDATA)>
<!ELEMENT pages     (#PCDATA)>
<!ELEMENT year      (#PCDATA)>
<!ELEMENT journal   (#PCDATA)>
<!ELEMENT volume    (#PCDATA)>
<!ELEMENT number    (#PCDATA)>
<!ELEMENT url       (#PCDATA)>
<!ELEMENT price     (#PCDATA)>

Comparison operators may be used to compare the string value of a node. For instance, the query:

/child::biblio/child::*[child::author = "E. Zimuel"]

retrieves all bibliography items having some author named E. Zimuel. Strings should be single or double quoted. The following query selects all articles published later than year 2000:

/child::biblio/child::article[child::year > 2000]

Since 2000 is not quoted, it is considered as a number, and the > sign is interpreted an the greater-than operator on numbers. If you write child::year > "2000", then 2000 is regarded as a string, and the > sign is interpreted an the greater-than operator on strings (the lexicographical order).

XPath defines a number of functions that you may use in filters (usually) or in raw expressions. Each function returns one of the four basic types: string, number, Boolean, node set. The only relevant Boolean function is not() that complements its argument. Other relevant functions follows:

Node set functions

These functions either operate or return information about node sets. The most relevant are:

position()

It returns the position, with respect to the document order, of the context node in the context node set. For instance, the next query returns the first author of the bibliography entry with key M4M05:

/child::biblio/child::*[attribute::key = "M4M05"]/child::author[position() = 1]

last()

It returns the position of the last element in the context node set, that is, the cardinality of the context node set. For instance, the next query returns the last author of the bibliography entry with key M4M05:

/child::biblio/child::*[attribute::key = "M4M05"]/child::author[position() = last()]

count(path)

It returns the cardinality of the context node set resulting from the evaluation of the argument path. For instance, the next query returns the number of authors of the bibliography entry with key M4M05:

count(/child::biblio/child::*[attribute::key = "M4M05"]/child::author)

while the next one selects the conference papers with more than 3 authors:

/child::biblio/child::inproceedings[count(child::author) > 3]

id(string)

It takes as input a string containing one or more IDs separated by whitespace and returns a node set containing all the nodes in the document that have those IDs. The input string can be a static string or a dynamic string obtained as a result of the evaluation of an XPath query. For instance, the next query returns the entry with ID M4M05:

id("M4M05")

while the next one selects the entries that are cited by the entry with ID M4M05:

id(/child::biblio/child::*[attribute::key = "M4M05"]/attribute::cite)

You may also write the last one as:

/child::biblio/child::*[attribute::key = "M4M05"]/id(attribute::cite)

Notice that the id() functions works only if you have properly declared the ID and IDREF attributes in the document DTD.

String functions

The most relevant functions that operate on strings are:

contains(string1,string2)

It returns true if the first string contains the second. Both strings can result from the evaluation of an XPath expression. For instance, the query that follows retrieves all entries that contain in the title the string logic:

/child::biblio/child::*[contains(child::title,"logic")]

starts-with(string1,string2)

It returns true if the first string starts with the second. Both strings can result from the evaluation of an XPath expression.

Number functions

The most relevant function that operates on numbers is sum(path) that takes a node set as arguments and sums the values of all nodes in the set after conversion into numbers. For instance, the next query returns the total price of all articles published in 2005:

sum(/child::biblio/child::article[year = "2005"]/price)