Aerial view of master-planned community and census-designated Ladera Ranch showing differential privacy on census data

Court Hears Arguments Over “Differential Privacy” Method Used to Anonymize Census Data

Among its many other functions, the US census data collected every 10 years is used to determine the number of seats in Congress that each state holds. A court battle is brewing as a collection of states challenges the Census Bureau’s use of “differential privacy,” a new method for anonymizing demographic data that has been introduced to strengthen defenses against “reverse engineering” census responses to track them back to a particular household.

17 states are challenging the Census Bureau, asking for a preliminary injunction to stop the use of differential privacy. The states are also demanding that the census data be released by July 31; it is normally released by the end of March of the following year.

Census data battle centers on House seats, distribution of federal funds and privacy concerns

In the past, census data has been obscured by the random swapping of certain details between respondents. However, in 2018 the Census Bureau began implementing plans to use differential privacy due to concerns that reconstructing the entire census database could now be theoretically possible if someone were to collect enough information. With a reconstructed database in hand, census responses could then be traced back to individuals. There are a number of possible ways in which this information could be abused: identifying the addresses of high-income and wealthy individuals for scams and robbery, racial profiling, and pursuit of undocumented immigrants among these concerns. Differential privacy adds mathematical noise to make connecting these dots of information much more difficult.

The state of Alabama is challenging the use of differential privacy in federal court, joined by 16 other states. The matter will be heard by a judicial panel of three federal judges; should they decide in favor of the Census Bureau, an appeal by the states would be sent directly to the Supreme Court.

The states allege that the use of differential privacy will result in a data set that is not accurate enough for the purpose of redrawing congressional and legislative districts. The process adds “noise” to the data set in the form of intentional errors meant to throw off anyone attempting to trace an individual’s identity back from the collected census data, but the Census Bureau says that these bits of chaff do not impact the statistical validity of the data.

The suit was triggered by the Census Bureau’s late reporting of the 2020 data, moved to an anticipated September release this year due to complications created by the pandemic. State politicians motivated by partisan politics are not the only source of concern over differential privacy; civil rights advocates and redistricting experts have also expressed similar worries that the “noise” introduced by this process could skew the count of racial or ethnic groups in communities. That could, in turn, disqualify these communities from federal funds that they would otherwise be entitled to.

Differential privacy attempts to head off anticipated data mining challenges

The states behind the lawsuit want the Census Bureau to stick with the prior methods of privacy protection, saying that the system was not broken and did not need to be fixed. The Census Bureau points to the general rise in available computing power between 2010 and 2020, arguing that it is now possible for data miners with access only to the publicly released information to reconstruct information that should be private. The Bureau tested this theory using the 2010 census data available to the public and found that it was able to identify 17% of the country’s population by combining it with information already available in commercial databases.

Though differential privacy is new as a means of securing census data, the technique was developed by Harvard cryptographers 15 years ago as a more general means of protecting personal information in research databases. Google, Facebook, Uber, Amazon, and Microsoft have all made use of it at times to secure internal databases.

A reply filed by the states attempts to define differential privacy as a “statistical method” that is intentionally riddled with errors, one that is unacceptable for the task given the “tremendous amounts of federal funding and political power” that census data directs. The court’s decision may hinge on the judges determining whether or not the method actually counts individuals accurately, something that is required by law (which also explicitly forbids the use of “statistical inference” that produces estimates rather than exact counts). The decennial census count employs over half a million Americans to fan out and personally visit residences to ensure that all households are manually counted with as great a degree of accuracy as possible, with most of the work usually done between May and July. The initial 2020 census data release in April saw the country’s overall population increase by 7.4% to over 331 million people, the lowest percentage growth rate in the country’s history since the initial census in 1790. While seats in the Senate are fixed to two per state, the census count influences the amount of state seats in the House of Representatives. The initial 2020 numbers saw Texas gain two seats in the House and five other states gain one, while seven states (including California and New York) lost one seat each.