Next page Previous page Start of chapter End of chapter

ID and IDREF attributes

ID and IDREF attributes are the XML counterparts of keys and foreign keys in relational databases. In the following we will show some typical use cases for these attributes.

Imagine a banking scenario in which there are customers and accounts. Customers may have zero or more accounts and accounts are associated to a unique customer. Here is a valid solution that uses ID attributes but avoids IDREF ones:

<?xml version="1.0"?>

<!DOCTYPE banking [
  <!ELEMENT banking    (customer*)>
  <!ELEMENT customer   (name, account*)>
  <!ATTLIST customer    id ID #REQUIRED>
  <!ELEMENT account    (bank,number)>
  <!ATTLIST account     id ID #REQUIRED>
  <!ELEMENT name       (#PCDATA)>
  <!ELEMENT bank       (#PCDATA)>
  <!ELEMENT number     (#PCDATA)>
]>

<banking>
   <customer id="C1">
   <name>Massimo Franceschet</name>
   <account id="A1">
      <bank>Fineco</bank>
      <number>34567</number>
   </account>
   <account id="A2">
      <bank>ABN AMRO</bank>
      <number>98672</number>
   </account>
   </customer>

   <customer id="C2">
   <name>Enrico Zimuel</name>
   <account id="A3">
      <bank>ING Bank</bank>
      <number>8909</number>
   </account>
   </customer>
</banking>

Each customer and each account has an id attribute of type ID. Since there is no account that is owned by more than one customer, we can list the customer's accounts as children of the customer element without duplicating any ID value.

Now suppose a different scenario in which customers may have zero or more accounts and accounts may be owned by one or more customers. The following is an invalid solution:

<?xml version="1.0"?>

<!DOCTYPE banking [
  <!ELEMENT banking    (customer*)>
  <!ELEMENT customer   (name, account*)>
  <!ATTLIST customer    id ID #REQUIRED>
  <!ELEMENT account    (bank,number)>
  <!ATTLIST account     id ID #REQUIRED>
  <!ELEMENT name       (#PCDATA)>
  <!ELEMENT bank       (#PCDATA)>
  <!ELEMENT number     (#PCDATA)>
]>

<banking>
   <customer id="C1">
      <name>Massimo Franceschet</name>
      <account id="A1">
         <bank>Fineco</bank>
         <number>34567</number>
      </account>
      <account id="A2">
         <bank>ABN AMRO</bank>
         <number>98672</number>
      </account>
   </customer>
   
   <customer id="C2">
      <name>Enrico Zimuel</name>
      <account id="A2">
         <bank>ABN AMRO</bank>
         <number>98672</number>
      </account>
   </customer>
</banking>

The solution is invalid since account identified by A2 is owned by both customers and hence its ID value is duplicated. A valid solution is to list customers and accounts separately and to represent their relationships by using IDREF(S) attributes (notice that attributes of type ID do not need to be named id and attributes of type IDREF(S) do not need to be named idref(s)):

<?xml version="1.0"?>

<!DOCTYPE banking [
  <!ELEMENT banking    (customer | account)*>
  <!ELEMENT customer   (name, accounts?)>
  <!ATTLIST customer    id ID #REQUIRED>
  <!ELEMENT name       (#PCDATA)>
  <!ELEMENT accounts   EMPTY>
  <!ATTLIST accounts   idrefs IDREFS #REQUIRED>

  <!ELEMENT account    (bank,number,owners)>
  <!ATTLIST account     id ID #REQUIRED>
  <!ELEMENT bank       (#PCDATA)>
  <!ELEMENT number     (#PCDATA)>
  <!ELEMENT owners     EMPTY>
  <!ATTLIST owners     idrefs IDREFS #REQUIRED>
]>

<banking>
   <customer id="C1">
      <name>Massimo Franceschet</name>
      <accounts idrefs="A1 A2"/>
   </customer>
   
   <customer id="C2">
      <name>Enrico Zimuel</name>
      <accounts idrefs="A2"/>
   </customer>
   
   <account id="A1">
      <bank>Fineco</bank>
      <number>34567</number>
      <owners idrefs="C1"/>
   </account>
   
   <account id="A2">
      <bank>ABN AMRO</bank>
      <number>98672</number>
      <owners idrefs="C1 C2"/>
   </account>
</banking>

In general, n:m relationships must be encoded using ID and IDREFS attributes, while 1:n relationships can avoid them. However, in n:m relationships it is not necessary to represent both the direct and the inverse relationships (accounts and owners in our example). One of them is sufficient. Nevertheless, representing both the relationships might be useful to simplify the queries. For instance, suppose that we represent only the accounts relationship in our banking example. In such a case, retrieving the accounts owned by a given customer is an easy task. However, the other way around (selecting the owners of an account) is more involved. The opposite if we represent only the owners relationship.

It is worth noticing that it is invalid to use the same value for any two ID attributes. Moreover, it is invalid to use a value for an IDREF(S) attribute that is not an ID value in the document. However, all the rest is valid, even if it is logically invalid. In particular, the validator does not care to check whether the idrefs values of owners elements are in fact customer identifiers, or whether the idrefs values of accounts elements are in fact account identifiers. Moreover, the validator does not check the inverse relationship constraint. This constraint specifies that, for instance, if customer C1 owns account A1, then account A1 must be owned by customer C1.

Next page Previous page Start of chapter End of chapter
Caffè XML - Massimo Franceschet