With the EU General Data Privacy Regulation and California Consumer Privacy Act fully entrenched and more privacy and data protection laws in the pipeline, data subject access requests are not going away. And with no common way of handling a DSAR, I’ve seen companies with some bad DSAR practices.
Two issues I want to address are (1) organizations requiring too much or unnecessary information to respond to a DSAR, and (2) organizations that claim not to have related data because they don’t tie that data to common quasi-identifiers, like names — the latter being most commonly associated with companies without a first-party relationship with the data subject.
To solve these issues, I’d like to introduce the concept of a dynamic DSAR.
Let’s assume an organization has some data elements or attributes related to individuals. These could be common real-world identifiers, like names or Social Security numbers, or they could be much weaker quasi-identifiers, such as hair color or websites visited. They could also be unique internal identifiers.
Each of these attributes has two characteristics we need to consider: provability and uniqueness. Provability means how much does this attribute prove the DSAR requestor is the individual to whom the data relates and what additional information is needed to give you confidence they are who they say they are.
Uniqueness quantifies the attribute’s tie to a single individual versus everyone. It should be considered both within your data set and within the population at large. You may only have one Jason Cronk, but is the requestor, Jason Cronk, the same as your data subject? (There are at least 50 Jason Cronks in the U.S.! But only one in privacy, I think!)
Looking at a common scenario, most companies use usernames and passwords to secure accounts. Usernames are likely unique, whereas passwords may not be.
But how provable are they?
Knowing a username proves nothing if usernames are shared with others, such as in a discussion forum. Knowing a password provides some confidence the requestor is the account holder, but is it enough?
The attorney general opinion on CCPA DSARs suggests companies need to consider a “more stringent verification process” when responding to a DSAR with sensitive or valuable personal information. Knowledge of the password might be sufficient in some cases but not in others.
If, additionally, the company also has the subject’s email address, it could ask the requestor to verify that. Now just knowing which email address is associated with the account provides a little bit of evidence as to the requestor’s identity, but because these, too, may be knowable to an attacker, we need further proof that the requestor has access to that email account. We can send a verification email with a code for the requestor to provide. With each additional question, we build additional confidence that the requestor is the data subject.
Figure 1: DSAR for an account holder.
You’re probably thinking this is nothing new; where does the dynamic part come in? That’s the fun part.
If the data subject is not an account holder with you, it might be a little bit harder to construct a verifiable DSAR process. Because of this, we need to return to the attributes we do have. Each of them is going to have a different level of uniqueness and provability. The uniqueness of the attribute will help narrow down related records in our database. A proof mechanism will provide some confidence, perhaps a little or a lot, that the requestor is the person to whom the records relate. Consider, as illustrated below, an e-commerce site that allows guest orders without the need to have an account.
Figure 2: DSAR for a guest (non-account holder) order on an e-commerce site.
The proof mechanism could be as simple as the requestor knowing the value of an attribute (like their username, password, favorite color or a recent transaction), proving access (for contact information, like email addresses or phone numbers) or through some other means (showing an old tweet where the data subject extolled the virtues of the color blue).
Other mechanisms might be appropriate where we collected the data automatically, such as IP address or MAC address. While these quasi-identifiers may be dynamic, some may be static enough that the requestor requesting from the same device as we have in our database gives us some confidence that the requestor is the data subject. Even a related IP address (showing the requestor coming from the same network or service providers) provides some confidence over a completely unrelated IP address.
This covers the second issue I set out to address: being able to identify requestors through uncommon quasi-identifiers.
Now for the first: not requesting more information than is necessary to identify the requestor. Static DSAR processes almost always request a lot of information upfront, usually from a list of standardized identification documentation, i.e., driver’s license, government-issued identification or passport. This leads to complaints of over-collection of data unnecessary to fulfill the request.
Figure 3: Process flow for a dynamic DSAR.
A dynamic DSAR, as illustrated in the process flowchart above, only requires the data necessary to prove the requestor is the data subject. It also provides the added benefit of not demanding data the requestor doesn’t want to provide.
If they can’t prove name, for instance, because they don’t want to scan in an ID or don’t have access to the ID or a scanner, they can go about proving themselves through other mechanisms, such as IP address, access to an email address, etcetera.
The mechanism also doesn’t request information beyond what is necessary because at the point the requestor has proven themselves with enough confidence to the organization, the DSAR can be fulfilled. No need to continue interrogating them with unnecessary verifications. Note that confidence levels need to be constantly evaluated in light of the state of the art of technology. Submitting a selfie might be sufficient to prove eye-color now, but the widespread availability of an iris photo filter will erode that.
While a better dynamic DSAR would base its dynamics on the actual uniqueness and provability of specific data values (John is not as unique in your database as Dirceu is for a first name), a simple dynamic DSAR could be done with any branching system: If requestor submits proof of access to email address go to release, else go to IP address proof. The branching would be based on an internal determination of the uniqueness of your data fields and confidence level in the proof mechanism for each attribute.
Using a dynamic DSAR process will hopefully alleviate many of the problems with the over-collection of information and problems where organizations don’t have first-party relationships with data subjects.
Photo by Filiberto Santillán on Unsplash