The XML declaration
XML documents can have an
XML declaration. If present, the XML declaration must be the first thing in the document (not even a white space may precede it). It starts with
<?xml, ends with
?>, and must have the pseudo-attribute
version and possibly
encoding and
standalone. The meaning of the pseudo-attributes is the following:
-
the version pseudo-attribute contains the XML version in which the document is written;
-
the encoding pseudo-attribute specifies the character set used in the document. If the server of the file system provides meta-information about the encoding, then this external encoding has priority on the XML declaration. By default XML documents are assumed to be encoded in UTF-8 encoding of Unicode character set. A character set maps particular characters, like Z, to particular numbers, like 90. These numbers are called code points. A character encoding determines how those code points are represented in bytes. Unicode is an international standard character set that can be used to write documents in almost any language you are likely to speak. UTF-8 is probably the most broadly supported character encoding of Unicode. It is a variable-length encoding. For instance, characters 0 through 127 are encoded in 1 byte each (as in ASCII), and those from 128 to 2047 use 2 bytes each;
-
the standalone pseudo-attribute may have the Boolean values yes or no. If the value is yes, then the document is self-containing, meaning that all the its values are present in the document. If the value is no, then some document value is specified
in an external DTD (for instance a default value for an attribute).
For instance, the following line declares a standalone XML document in version 1.0 with an ISO-8859-1 encoding (ASCII plus the characters for most Western European languages)
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
Since XML documents are written in Unicode, you can use it to write multilingual documents. In such documents, it is useful to identify in which language a particular section is written. For instance, a multilingual spell-checker might
check the spelling of sections in different languages. Each XML element may have an xml:lang attribute that specifies the language of the content of the element. Language codes are defined in ISO-639. An example follows:
<paragraph xml:lang="en">
The following is a Greek maxim saying: "The wise man knows himself".
</paragraph>
<maxim xml:lang="el">
σοφός
έαυτόν
γιγνώσκει
</maxim>