This is the 10th in a series of Privacy Tech posts focused on privacy engineering and user experience design from Lea Kissner.
Humans' names are really, really complicated and virtually impossible to handle even vaguely correctly. To start with, not everyone has a name. In some cultures, babies are not named for weeks, months or even years. In some cultures, such as the Matsigenka of the Peruvian Amazon, names are not used at all. You’re unlikely to encounter an adult on the internet who has not adopted some sort of name, given the hurdles necessary to get on the internet in the first place, but babies without names are quite common. In the U.S., they may get a placeholder name of “baby girl/boy,” but that is not universal. How will your system handle this case?
So then you need to handle people who have name(s). They may have one name (for example, people from Java are commonly mononymous), they may get additional names by adding geographical features (for example, their village name), they may get a family name, they may get a name derived from the given name of a parent (for example, literally taking one parent’s given name with a suffix like “ova”) or names derived from the family name of one or both parents. They may have one or more given names and names from religious figures. They may have other words in their name. like “Jr.” or “de” or “e”; dropping those can be significant. Point being, people have a variable number of names and those names may or may not be consistent within family members.
You can’t assume that each person has one ultimately “correct” canonical name, even on a single document. For example, on a German passport, names with an umlaut or letter ß will be spelled two different ways (one transliterated to remove those features). The more character sets you get into, the more ways there are to write a single name. How will you handle people who write the same name in multiple ways? Also, because of restrictive naming laws, because names can be very contextual, and for other reasons, someone’s legal name may not match the name they use for an extended period of time; if your system needs both (e.g., for tax vs display purposes), how will you ensure the right name is used for the right purpose?
Names change. People get married, people come out as transgender, people get their name upon graduating from childhood, and people change their names for all sorts of reasons that aren’t tied to major life events. In some cases, the person changing their name never wants to see that old name again. This is often true of transfolk, as well as people who are shedding the name of an abuser. How will their names change in your system? How will you handle multiple names? How will that affect the systems your system is linked to?
Speaking of linked systems, how are you handling names across different systems? It is exceedingly common for names that are entered separately across multiple systems not to match, especially if they are typed in by someone other than the person in question, like in a medical or government office. Typos happen, especially for names unfamiliar to the typer. Typos and system inconsistencies happen especially often for names with accents or punctuation. While it can change the meaning of a name to be missing an umlaut or accent or virgulilla, don’t count on it having been typed in consistently. The same goes for names with hyphens. Hyphens get dropped, hyphens get added where they shouldn’t (for example, to people with multiple family names or given “first” names).
Names get even more complex when you start throwing in not just punctuation but different character sets. Most names in the world are not natively representable in the ASCII character set. Most names are representable in Unicode, but while it’s the best option, still not all names are representable (for example, certain names in Chinese and Japanese use old-style characters that aren’t represented in Unicode).
Even once you get a name into a system, your challenges are often only beginning. Once it’s there, what do you do with it?
If you want to use it to identify someone, it’s unlikely to be sufficient. Names are not unique. I used to work with four different people at Google with the exact same first and last names. Three family names cover more than half of people from South Korea. Several hundred thousand people are named 张玮 (Zhang Wei) ... and I had to think about all of this and more just this week because we wanted to address our users in a polite, friendly way.
In the end, names are non-unique, non-canonical, changeable tags for people. Using them is fraught with issues, so design carefully. The World Wide Web Consortium has good recommendations and Patrick McKenzie has a good checklist to start thinking about your system.
Photo by chuttersnap on Unsplash
If you want to comment on this post, you need to login.