Apple’s heavily-marketed but proprietary implementation of differential privacy is no longer secret. Researchers at the University of Southern California, Indiana University, and Tsinghua University have reverse engineered Apple’s MacOS and iOS implementations of differential privacy. An academic article describing the results was published on Sept. 8, and Wired broke the news the following week in an article titled, “How One of Apple’s Key Privacy Safeguards Falls Short.”
As I described in a previous post, differential privacy measures theoretical privacy loss with a parameter called epsilon—smaller epsilon means more private. Theoreticians generally agree that an epsilon less than 0.1 is very safe, and an epsilon less than 1.0 is probably ok. The USC, Indiana, and Tsinghua researchers reveal that Apple’s MacOS implementation uses an epsilon of 6, while iOS 10 uses an epsilon of 14. So what does this tell us about how private Apple’s data collection is? Or to take a concrete example, if the government were to subpoena Apple’s data, what do these epsilons tell us about how likely it is that the government could identify specific individuals in the data? The answer: The epsilons tell us nothing. Nada.
Not. A. Thing.
Why is this? Because differential privacy is only a theoretical worst-case bound on a particular mathematical definition of privacy loss. In other words, the data might be much better protected than the epsilon suggests, but it will never be less protected. A high epsilon doesn’t mean the data is unsafe, only that it might be.
My research group first encountered this limitation three years ago when Alexey Reznichenko implemented and deployed a privacy preserving behavioral advertising system that used differential privacy to collect usage statistics from over 13,000 opted-in users. We set our epsilon to 1.0: the edge of what might be considered private. What struck us the most is that we could have gathered far more data from each user and still been nowhere near able to identify specific users. In other words, differential privacy was unnecessarily limiting our analytics. That is when I decided that differential privacy had a long way to go to being practical.
So we have a bit of a mess.
Researchers are saying that Apple’s epsilon parameters are not meaningful, Wired is obliquely suggesting that Apple has been deceptive, and Apple is rigorously defending its privacy practices. Who is at fault here? While I do think that Apple made a misstep in over-hyping its use of differential privacy and keeping its implementation proprietary, I have no reason to believe that their privacy practices are not quite strong. I am inclined to believe that they are genuine in their commitment to privacy, and that their practices are good.
Where I find the main fault is in the academic research community. In all my 30 years as a researcher, differential privacy is the most over-hyped technology I have ever seen. To listen to differential privacy researchers, you would think that we are now able to nearly perfectly defend against even the most resourceful attacker. “Guaranteed privacy,” “future proof.” The problem is that nobody has figured out how to build a differentially private system that has both a low epsilon and adequate utility. Invariably to get a low epsilon one must simply stop gathering data. This is just not a realistic option.
To quote from the Wired article, “a new study … suggests [Apple] has ratcheted that dial further toward aggressive data-mining than its public promises imply.” The problem is that if Apple ratcheted that dial to a strongly private setting, then their data collection would be useless.
Rather than accusing Apple of being a metaphoric privacy mega-polluter, the academic research community should be cleaning up its own act. Until they can produce a system that is strongly private while at the same time provides good utility, the academic research community should stop implying that differential privacy is a workable technology. It is not. The pot of privacy at the end of the differentially private rainbow is, for now, unreachable.
photo credit: ::ErWin Südtirol 2017 via photopin (license)