Next page Previous page Start of chapter End of chapter

Entity definitions

An entity is a shortcut for a piece of data. The data is not necessarily in XML format. We can have different types of entities: internal or external, predefined or defined by the user. We already discussed predefined entities in the XML chapter. Here, we introduce user-defined entities. Internal user-defined entities are defined in the DTD as follows:

<!ENTITY name "text">

where ENTITY is the definition keyword, name is the entity name, and text is the entity replacement text. The text value is a string possibly containing well-formed markup. Here are a couple of examples:

<!ENTITY XML "Extensible Markpup Language">
<!ENTITY footer "<author>Massimo Franceschet</author>
                 <date>16 February 2005</date>">

External user-defined entities are defined in the DTD as follows:

<!ENTITY name SYSTEM "URI">

where URI is a document containing the entity replacement text of the entity name. For instance:

<!ENTITY footer SYSTEM "footer.xml">

The entity replacement text must be well-formed but it does not need to be wrapped up into a unique root element. When external, the entity replacement text may have a text declaration. This is similar to an XML declaration, but the version attribute is optional and the encoding one is required. Finally, the replacement text may contain other entity references but self-referential and circular references are forbidden. User-defined entities are used as predefined ones. E.g., to invoke the entity called XML you have to write &XML; in the XML document. The parser will replace the entity call with its text value.

One use of entities is to avoid to type the same text in the XML document many times. Here is an example:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

<!DOCTYPE slides [
   <!ELEMENT slides     (slide*)>
   <!ELEMENT slide      (course,university,image,title,content,author,date,about)>
   <!ELEMENT title      (#PCDATA)>
   <!ELEMENT content    (#PCDATA)>
   <!ELEMENT author     (#PCDATA)>
   <!ELEMENT date       (#PCDATA)>
   <!ELEMENT about      (#PCDATA)>
   <!ELEMENT course     (#PCDATA)>
   <!ELEMENT university (#PCDATA)>
   <!ELEMENT image      EMPTY>
   <!ATTLIST image source CDATA #REQUIRED>
   <!ENTITY  XML "Extensible Markpup Language">
   <!ENTITY  UdA "Universitą &quot;G. D'Annunzio&quot;">
   <!ENTITY  footer "<author>Massimo Franceschet</author>
                     <date>16 February 2005</date>
                     <about>&XML;</about>">
   <!ENTITY  prolog SYSTEM "prolog.txt">
]>

<slides>
   <slide>
      &prolog;
      <title>&XML; fundamentals</title>
      <content>The fundamentals about &XML;</content>
      &footer;
   </slide>
   <slide>
      &prolog;
      <title>&XML; schema languages</title>
      <content>The schema languages for the &XML;</content>
      &footer;
   </slide>
</slides>

where prolog.txt contains the following text:

<?xml encoding="ISO-8859-1"?>
<course>Semistructured data: representation and query languages.</course>
<university>&UdA;</university>
<image source="logo.jpg"/>

The parsed XML document should look like this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<slides>
   <slide>
      <course>Semistructured data: representation and query languages.</course>
      <university>Universitą "G. D'Annunzio"</university>
      <image source="logo.jpg"/>
      <title>Extensible Markpup Language fundamentals</title>
      <content>The fundamentals about Extensible Markpup Language</content>
      <author>Massimo Franceschet</author>
      <date>16 February 2005</date>
      <about>Extensible Markpup Language</about>
   </slide>
   <slide>
      <course>Semistructured data: representation and query languages.</course>
      <university>Universitą "G. D'Annunzio"</university>
      <image source="logo.jpg"/>
      <title>Extensible Markpup Language schema languages</title>
      <content>The schema languages for the Extensible Markpup Language</content>
      <author>Massimo Franceschet</author>
      <date>16 February 2005</date>
      <about>Extensible Markpup Language</about>
   </slide>
</slides>

Finally, a parameter entity is a shortcut for a piece of DTD (not for a piece of XML data). Like general entities, they can be internal and external. Here is an example:

<!ENTITY % body "<!ELEMENT title (#PCDATA)>
                 <!ELEMENT content (#PCDATA)>">
<!ENTITY % prolog SYSTEM "prolog.dtd">

To invoke the parameter entity called body you have to write %body; in the DTD. An interesting feature is that parameter entities can be defined in the external DTD subset and redefined in the internal DTD fragment. In this case, the internal definition counts.

Parameter entities are often used for modularization of DTDs. When a DTD is large, the set of declarations is typically separated in different modules and only the useful modules may be included in another DTD. For instance, the following definitions include in the current DTD the definitions contained in the body.dtd document:

<!ENTITY % body SYSTEM "body.dtd">
%body;

Modularization often exploits the IGNORE and INCLUDE XML directives. The first one is used to comment out a section of declarations:

<![IGNORE[
  <!ELEMENT note (#PCDATA)>
]]>

The effect is to ignore the element definition in the current DTD. The second directive is useful to include a section of declarations:

<![INCLUDE[
  <!ELEMENT note (#PCDATA)>
]]>

The effect is to use the element definition is the current DTD. Parameter entities and the above directives can be used to implement a conditional inclusion of declarations. Suppose we define the following parameter entity:

<!ENTITY % switch_note "INCLUDE">

Now we can replace the keyword INCLUDE with the defined parameter entity:

<![%switch_note;[
  <!ELEMENT note (#PCDATA)>
]]>

The parameter entity switch_note can then be redefined in internal DTD fragments, allowing to switch on and off the note element definition.

As a final remark, notice that Mozilla does not load external entities (and DTDs). You can used xmllint with option --noent to parse an XML document with any kind of entities.

Next page Previous page Start of chapter End of chapter
Caffč XML - Massimo Franceschet