Methods of automatic discovery of address book entries

This article is meant to address the following problem: if all you have is someone's domain name or email address, how are you supposed to find their complete contact information? This document will specify a few conventions that can be obeyed and heuristics to be tried that can help attain such a result. This way, you can elect to publish your information in such a fashion that suitably smart software can find it, or clients can be developed to take advantage of this information.

Two competing standards

Sometimes two closely related standards that accomplish similar tasks co-exist. For example, OpenPGP and S/MIME are two similar standards that specify methods for digital signatures and end-to-end encryption of email. With address books and directory entries, the situation is similar. There are two standards we need to be concerned with: LDAP, the Lightweight Directory Access Protocol, which tends to be used more for organizations and businesses that need to keep an internal directory, and vCard, which tends to be used more by individuals to share their contact information with each other. These are both specified by standards published by the Internet Engineering Task Force. We shall examine both standards. If you are an individual interested in publishing your contact information so anyone can find it, it would be wise to publish info according to the following conventions for both LDAP and vCard.

Publishing information with LDAP

LDAP and the related X.500 standards together specify both a standard schema for how contact information should be represented as well as a protocol for how to actually access this data. LDAP obeys a traditional client-server model where a client looking for information about someone connects to a server that it reasonably believes to hold the data, and then specifies a filter to search for an entry satisfying the criteria it's looking for. For example, an example of a filter might be "Give me all entries that either have the email address jscott@posteo.net or have the first name 'John'." When the entries are returned to the client, every entry is made up of attributes, details and information about the person like their name, email address, encryption keys, contact photos, and anything else somebody might want to know. Put another way, filters specify criteria for what entries you are interested in in terms of their attributes, and then the return information may have a bunch of other attributes that you may want to know about.

LDAP server discovery

Fortunately, discovering an LDAP server based off an email address or the domain contained within an email address is straightforward. For a particular domain name, you can check the DNS SRV records associated with it to discover where the LDAP server for a domain is located, or if the SRV records don't exist, just connect to the domain name on the typical port for LDAP, which is 389. Once you connect to the LDAP server, the information contained within can be authenticated using TLS, just as is the case for when you connect to a website with HTTPS for example.

Crafting an LDAP query and using the data

If you're looking for a person, perhaps based off of their email address, crafting an LDAP query is easy and can be done in the normal way. If the data is to be used by an LDAP-aware program such as a sophisticated mail client, the job is done. Otherwise, there are conventions for how to convert LDAP data to vCard, another standard for representing address book data. It would be wise to periodically check for updates to the data if it is to be saved in a user's address book, and the solution is simple: just connect to the LDAP server again and check for new data.

LDAP drawbacks and considerations

Setting up an LDAP server can be overly complex for a sole individual looking to publish their information. Furthermore, the majority of folks do not own their own domain name. It would be more appropriate if email providers, for example, managed LDAP servers for their users to publish their information in, and then made much of the information public. For businesses and organizations, they may well already need an LDAP server for their internal usage, so it would be more appropriate to open up their directory to the outside world and make some non-confidential attributes public.

An introduction to the vCard format

The vCard format is a simple text-based file format that can convey a lot of information about contacts. Unlike LDAP/X.500, it is not very extensible, but it is designed to suit the vast majority of use cases and store all of the attributes that one may want to know. vCard is just a file format; it is unspecified how vCards are supposed to be obtained, but they can generally be sent over any channel that text or files can be conveyed. There is an LDAP alternative known as CardDAV that stores vCards, but these are very rarely available to the public if that is even possible and appropriate, and since I am (not yet) very informed on CardDAV, it shall not be further discussed. The majority of computer and phone address books used by average people store info in vCard format, so by publishing your contact information as a vCard, you make it less likely that information will be lost during conversion than if you were to rely solely on an LDAP solution.

vCard web discovery

The weak point of using vCards is how you are supposed to obtain them in the first place, and this is the primary problem this article is meant to address. One solution is to put the files on a website and have a crawler search for them, but this is inefficient and inappropriate if a large number of vCards are to be associated with a domain, say because there are many users with an associated email address. However, by only traversing pages known to be associated with a person of interest, say their personal homepage, this can be partially mitigated. When vCards are obtained this way, the information is authenticated via TLS, so the information can be trusted. Even OpenPGP keys and S/MIME certificates within the vCard data can be trusted if the domain the data was obtained from matches the user IDs and/or subject names.

vCard updates

It's relatively little-known, but using the SOURCE: field, vCards can specify where updates should be obtained from, and the REV: field can indicate whether the fetched vCard is newer or older than the one you already have. The SOURCE: field can also specify an LDAP URL, but if you are interested in publishing your information as a vCard proper, this isn't very interesting.

Publishing vCards in possibly-secure DNS

Although there is no official convention, using a data: URI, vCards can be published in DNS via a TXT record, and such information can be authenticated via DNSSEC if enabled for the domain. Although vCard is a text file format, the need for it to contain line feeds and newline characters means that the data should either be URI-encoded, or it should be base64 encoded, both of which are common conventions for representing binary data in a data: URI. Since the location of a DNS record cannot be represented as a URI, it would be wise to publish your vCard in an arbitrary location on the web so that you can have a SOURCE: URI pointing to it for folks to fetch updates from.

The location of a DNS record associated with an email address

In case there are many users associated with a domain, it is necessary to make TXT records containing vCards be stored at different locations depending on the user. I propose a convention borrowed from the OPENPGPKEY and SMIMEA DNS records, which is to make a label composed of the first 56 base 16 digits of a base 16-represented SHA-256 hash of the email address local part, prefixed by an underscore, tack on another label afterwards consisting of the exact string "_vcard", and then affix the domain at the end. For example, this convention means that a TXT record containing my vCard might be found at the following location: _0e5167851e125811a752cd4cc36946839c7f197bd67a87348aba92fb._vcard.posteo.net.

Conclusion

There are a variety of ways to publish contact information so that it can automatically be discovered by machines with a minimal amount of information to go off of. For the sake of dogfooding, I plan to adopt some mixture of all of these conventions in the future, but obviously it will be necessary for clients to implement support for these mechanisms. I foresee it being more likely that a client will implement the LDAP discovery mechanism as it is already not uncommon to find an LDAP server based on the domain components of the distinguished name. As such, I propose that email and other service providers consider running LDAP servers for their users to publish their public information in, and that if they already run internal-only LDAP servers that they consider exposing them to the public Internet and make some attributes public.

As for vCard, it's already not uncommon for a vCard to be published on a web page, but discovery can be computationally painful. A DNS TXT record would partially solve this, but it's the least likely solution to be adopted by clients.