What is differential privacy?
Differential privacy (DP) is a mathematic definition of privacy that guarantees an algorithm’s output is insensitive to adding, removing or changing one record in its input database. DP is considered the “gold standard” for privacy and is a mathematical definition rather than a specific process, though is often embodied in the addition of a constant amount of noise to data.
Two major approaches to differential privacy
- Standard differential privacy (SDP) This more common approach, is typically implemented by collecting data from individuals in the clear at a trusted data collector, then applying one or more differentially private algorithms, and then releasing the outputs. Works best in cases like with the US Census Bureau’s OnTheMap project where there is a natural trusted data curator.
- Local differential privacy (LDP). In LDP, individuals perturb their records before sending it to the server, obviating the need for a trusted data curator. Since the server only sees perturbed records, there is no centralized database of sensitive information that is susceptible to an attack or subpoena requests from governments. The enhanced compliance properties of LDP makes it attractive for commercial applications.
Challenges with local differential privacy (LDP)
However, the improved security properties of LDP come at a cost in terms of utility. Differentially private algorithms hide the presence (or absence) of an individual by adding noise. Under the SDP model, counts over the sensitive data can be released by adding a constant amount of noise.
In the LDP model, however, since noise is added to each individual record, answering the same count query requires adding an error for the same level of privacy. This means that under the LDP model with a database of a billion people, one can only learn properties that are common to at least 30,000 people. In contrast, under SDP, one can learn properties that are shared by as few as 100 people. Thus, the LDP model operates under more practical trust assumptions than SDP, but as a result incurs a significant loss in data utility.
References
- https://snwagh.github.io/Publications/dpcrypto.pdf
- https://cacm.acm.org/magazines/2019/3/234925-understanding-database-reconstruction-attacks-on-public-data/fulltext?mobile=true?mobile=false
- https://dl.acm.org/doi/10.1145/773153.773173
Devron is a next-generation federated learning and data science platform that enables decentralized analytics. Learn more about our solutions, read more of our knowledge base articles, about our federated learning platform, or schedule a demo with us today.