An XML document is built from text. The text is marked up with text tags. Text tags are enclosed in angle brackets. Here is very simple yet complete XML document:
<person> Alan Turing </person>
In the above document there is a single element named person. The element is delimited by the start-tag <person> and the end-tag </person>. The tags are a form of markup. Everything between the start-tag and the end-tag is called the content of the element. In this case, the content of the person element is the string Alan Turing.
XML tags resemble HTML ones. However, there are two major differences. In XML you can invent your own tags, while the tag set of HTML is fixed. More importantly, the XML tags are meant to describe the type of content rather than formatting or layout information. In other words, in XML you don't say that something is italicized or indented, you say that something is a person or a person name or a person address.
XML elements can contain unmarked text (called character data) or other XML elements. Here is a more involving example:
<person> <name> <first>Alan</first> <last>Turing</last> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
The person element contains a name subelement and three profession subelements. Moreover, the name element contains a first and a last child elements. The profession, first and last elements contain only character data (that is, text without markup). An XML document must adhere to the following rules:
As a result, an XML document can be represented as a tree structure: elements are mapped to nodes (the document element corresponding to the tree root), and subelements corresponds to child nodes. For instance, the above XML document is mapped to the following tree:
Elements can also mix character data and subelements as in the following example:
<person> <first-name>Alan</first-name> <last-name>Turing</last-name> is mainly known as a <profession>computer scientist</profession>. However, he was also an accomplished <profession>mathematician</profession> and a <profession>cryptographer</profession>. </person>
Empty elements, that is elements with no content, as also possible. An empty element called address can be abbreviated as follows: <address/>. Finally, mind that XML is case sensitive, hence address and Address are different tags.