W3C XML Schema
DTD has some major drawbacks, including the following:
- it has no notion of type. The contents of leaf elements or attributes can be any character data or none at all. The assignment of a type to an element or an attribute adds semantics to that element or attribute;
- it is not possible to constrain the number of occurrences of elements, for instance it is not possible to say that an element must occur at most three times. Also, it is hard to specify unordered contents;
- the ID/IDREF mechanism is too simple, for instance it is not possible to restrict the scope of uniqueness for ID attributes to a fragment of the entire document. Also, only individual attributes can be used as keys;
- it does not support namespaces, because it predates the W3C recommendation for them;
- it provides limited support for modularity, reuse, and evolution of schemas. This makes it hard to maintain large interrelated schemas;
- it is not described in XML notation, which would have been handy to manipulate schemas with XML tools, e.g., to check that a DTD is well-formed or to query the schemas.
XML Schema is a proposal from the W3C that solves these problems. In particular, it contains a powerful type system that allows to define simple and complex types and also to inherit types from other types in the style of object-oriented programming languages. Types can be attached to elements and attributes, adding meaning to their interpretations. Unfortunately, this comes at a price: XML Schema is generally complicated to understand and hard to use for non-experts (in fact, the W3C specification is difficult to read also for XML experts!). In the following, we will try to distill the useful core of the XML Schema language.