Next page Previous page Start of chapter End of chapter

Element definitions

Every element used in a valid document must be declared in the DTD with an element definition as follows

<!ELEMENT name content>

where name is the element name and content specifies what children the element may or must have in what order. The content of the element definition may be:

Parsed character data
This is the simplest content specification that says an element may only contain text including entity and character references, but may not contain subelements:
<!ELEMENT email (#PCDATA)>
Child element
This says an element may only contain one child element of a given type:
<!ELEMENT contact (e-mail)>
Choice
This says an element may contain one kind of child or another, but not both:
<!ELEMENT contact (e-mail | phone)>
Sequence
This says that an element can contain multiple child elements in the given order:
<!ELEMENT name (first,last)>
Empty content
This says an element must have no content at all (but it may have attributes):
<!ELEMENT image EMPTY>
Any content
This says an element may have any content (however its children, if any, must be defined):
<!ELEMENT image ANY>
Iteration
There are three suffixes one can affix after names, sequences, and choices in order to specify how many child elements are expected. These are:
*
Zero or more instances are allowed
+
One or more instances are allowed
?
Zero or one instances are allowed
For instance, the following definition says that name must have zero or more first children, possibly followed by a middle child, followed by one or more last children:
<!ELEMENT name (first*,middle?,last+)>
Given this definition, all the following name elements are valid:
<name>
   <first>Samuel</first>
   <middle>Lee</middle>
   <last>Jackson</last>
</name>

<name>
   <first>Samuel</first>
   <first>Michael</first>
   <last>Jackson</last>
</name>

<name>
   <last>Jackson</last>
   <last>Keaton</last>
</name>
The following definition says that name may have any number of first, middle, and last children in any order:
<!ELEMENT name (first | middle | last)*>
The following is a notable example of mixed content, that is text interleaved with markup, which is typical content in narrative documents. It specifies that name may have any number of first, middle, and last children in any order possibly interleaved with parsed character data:
<!ELEMENT name (#PCDATA | first | middle | last)*>
Given this definition, the following name element is valid:
<name>
   First comes the first name: <first>Samuel</first>
   Then the middle one: <middle>Lee</middle>
   Last comes the last name: <last>Jackson</last> 
   Not very surprising indeed!
</name>

It's worth noticing that this is the only way to indicate mixed content: you can only say that an element contains any number of any elements from a list in any order, as well as parsed character data. Moreover, the keyword #PCDATA must be the first in the list. Finally, consider the following alternative definition for name:
<!ELEMENT name (first | last | (first,last))>
This apparently innocuous definition is in fact invalid. The error that you get if you try to validate a document with this definition is something like: Content model of name is not determinist.

In fact, the error is in the DTD, not in the XML. The content model generated by any DTD must be deterministic. In this case it is not, since when the validator reads a first element in the XML there are two possible ways forward: one is that the definition for name is finished (the first disjunct in the definition), the second is that a last element is expected (the last disjunct in the definition). Hence, reading the same symbol the validator can reach two different states. This is a typical nondeterministic behavior. The rationale behind this limitation is that DTD processors should be easier to implement.
Next page Previous page Start of chapter End of chapter
Caffè XML - Massimo Franceschet