D. J. Bernstein
Internet mail
SMTP: Simple Mail Transfer Protocol

Encoded addresses

Address restrictions

RFC 821 puts restrictions on the Internet mail addresses that can be expressed in SMTP.

It requires that the domain part of each address be a sequence of elements separated by dots, where each element is either a component, a sequence of digits preceded by #, or a dotted-decimal IP address surrounded by brackets. The only allowable characters in components are letters, digits, and dashes; Klensin specifically requires that underscores be rejected. Every component must have at least one character.

In practice, servers have to be prepared to handle domains ending with dots, and components containing underscores.

RFC 821 also requires that each component have at least three bytes; that the first byte be a letter; and that the last byte be a letter or digit. However, these restrictions are completely out of whack with reality. As far as I know, no server enforces them.

RFC 821 prohibits non-ASCII characters, empty box parts, box parts longer than 64 bytes, and domain parts longer than 64 bytes. These requirements are occasionally disobeyed in practice.

RFC 821 requires that servers be able to handle box parts as long as 64 bytes and domain parts as long as 64 bytes.

The characters \0 and \012 are unsafe in practice. Klensin prohibits all ASCII control characters.

Routes

An at-domain is an @ followed by a domain. A route is one or more at-domains, separated by commas. For example,
     @heaven.af.mil,@uucp.local
is a route.

Routes were heavily discouraged in RFC 1123. Their function is actively subverted by most Internet mailers. Clients should not generate them. However, they still show up occasionally; servers have to be able to parse the route syntax.

Encoded box parts

SMTP offers three ways to encode a character inside an address: An encoded box part is either (1) a sequence of one or more slashed or safe characters or (2) a double quote, a sequence of zero or more slashed or quoted characters, and a double quote. It represents the concatenation of the characters encoded inside it.

For example, the encoded box parts

     angels
     \a\n\g\e\l\s
     "\a\n\g\e\l\s"
     "angels"
     "ang\els"
all represent the 6-byte string "angels", and the encoded box parts
     a\,comma
     \a\,\c\o\m\m\a
     "a,comma"
all represent the 7-byte string "a,comma".

How to create an encoded box part

Here is some advice to clients on how to represent a string as an encoded box part.

Copy the string without quoting if

Otherwise, insert a backslash before each unsafe character, and surround the string with quotes.

Encoded addresses

An encoded address contains
  1. the byte <;
  2. optionally, a route followed by a colon;
  3. an encoded box part, the byte @, and a domain; and
  4. the byte >.
It represents an Internet mail address, given by concatenating the string represented by the encoded box part, the byte @, and the domain.

For example, the encoded addresses

     <God@heaven.af.mil>
     <\God@heaven.af.mil>
     <"God"@heaven.af.mil>
     <@gateway.af.mil,@uucp.local:"\G\o\d"@heaven.af.mil>
all represent the Internet mail address "God@heaven.af.mil". Beware, however, that sendmail interprets these as different addresses.

How to read an encoded address

Here is some advice to servers on how to parse an encoded address.

Keep track of whether you are inside quotes or outside quotes; initially you are outside quotes. Skip the starting <. If the next character is @, skip all characters through the next colon. Now perform the following procedure:

  1. Read a character.
  2. If it's a backslash: Read another character. Append that to the address. Go back to step 1.
  3. If it's a double quote: You are now outside/inside quotes if you were previously inside/outside quotes. Go back to step 1.
  4. If it's > and you are outside quotes: Stop. The address is complete.
  5. Append this character to the address. Go back to step 1.
This allows any number of quoted pieces of the encoded address.

Beware that some clients (e.g., Neologic SMTPD, Worldgroup SMTP) incorrectly include one or more spaces before each encoded address:

     RCPT TO: <incorrect.spaces@heaven.af.mil>   (WRONG)
Some clients (e.g., Windows CE) fail to enclose addresses in brackets:
     RCPT TO: missing.brackets@heaven.af.mil     (WRONG)
In some cases a client will omit the domain:
     RCPT TO:<root>