Cookie Consent

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Data Science Community Knowledge Base

What is differential privacy?

Differential privacy (DP) is a mathematic definition of privacy that guarantees an algorithm’s output is insensitive to adding, removing or changing one record in its input database. DP is considered the “gold standard” for privacy and is a mathematical definition rather than a specific process, though is often embodied in the addition of a constant amount of noise to data.

Differential privacy is divided into two major approaches

  1. Standard differential privacy (SDP) This more common approach, is typically implemented by collecting data from individuals in the clear at a trusted data collector, then applying one or more differentially private algorithms, and then releasing the outputs. Works best in cases like with the US Census Bureau’s OnTheMap project where there is a natural trusted data curator.
  2. Local differential privacy (LDP). In LDP, individuals perturb their records before sending it to the server, obviating the need for a trusted data curator. Since the server only sees perturbed records, there is no centralized database of sensitive information that is susceptible to an attack or subpoena requests from governments. The enhanced compliance properties of LDP makes it attractive for commercial applications.

However, the improved security properties of LDP come at a cost in terms of utility. Differentially private algorithms hide the presence (or absence) of an individual by adding noise. Under the SDP model, counts over the sensitive data can be released by adding a constant amount of noise. In the LDP model, however, since noise is added to each individual record, answering the same count query requires adding an error for the same level of privacy. This means that under the LDP model with a database of a billion people, one can only learn properties that are common to at least 30,000 people. In contrast, under SDP, one can learn properties that are shared by as few as 100 people. Thus, the LDP model operates under more practical trust assumptions than SDP, but as a result incurs a significant loss in data utility.

References

View All Knowledge Base Questions

See how Devron can provide better insight for your organization

Request a Demo