By Tanner Bowen
Tanner Bowen is a junior at the University of Pennsylvania studying business.
With the increasing usage of data mining and forecasting in the private sector, it seems inevitable that the public sector will try to leverage these technologies to become more efficient and to better allocate resources. But, as we discussed in the last few of my blog posts, this implementation will not come without potential legal hiccups from the United States judiciary. Although concepts like non-delegation and due process can seem somewhat intangible to the average citizen, the one area where machine learning can greatly impact the lives of individuals will be whether its usage will lead to discriminatory practices.
The Fourteenth Amendment’s Equal Protection Clause prohibits discrimination by states, and since the Supreme Court’s ruling in Bolling v. Sharpe, we see that there may even be a Fifth Amendment claim involving a violation of the Due Process clause.  However, this standard is not the standard the federal government has to explain in disproportionate impact cases. Specifically, “a purpose to discriminate must be present.” 
This may seem a little nebulous, but in terms of administrative law, this just means that if the federal government is challenged in court and its action is found to constituted disparate treatment, then we see the courts apply different levels of scrutiny (strict, intermediate, and rational basis review).
Although it is very possible that those who develop algorithms that federal agencies might use can be inherently discriminatory (such as building targets of minorities into the algorithm), it doesn’t seem on its surface that these algorithms are intentionally discriminatory. If we recollect earlier discussions of data analysis, we see that algorithms are a black box.  There are programs and plots that one can you to try to peek into the box, but we don’t necessarily see how the algorithm snoops over the data. There may be very complicated interactions between the predictors, but because most machine learning techniques are non-parametric (there is no algebraic equation describing the model), we don’t even get to see what variables are used and what their coefficients are.
What does this mean for plaintiffs who bring disparate treatment cases against the government? They will have to look at the effects of the algorithm in action. This means that plaintiffs might have to prove that one class, such as their race, was consistently subjected to different outcomes than individuals of other races.
This might not be an entirely difficult burden to overcome. When you develop algorithms, you have to battle test them through using “training” data and then testing their validity on “test” data. However, if one thinks carefully about how this data was collected, it is very possible that the algorithm itself might create disparate treatment to certain protected classes if the historical mechanisms that influenced the outcomes of this training data were inherently discriminatory. In other words, if the historical data comes from racist institutions or actions, the machine learning algorithm might lead to confirmation of these racist policies.
Moving forward, when governments implement policies, they will have to be conscious of using race or gender classifiers in predictive analytics. Sometimes, even the development of technology to collect data to better allocate scarce resources can lead to discriminatory outcomes. The recent White House report on the usage of big data includes a case study of Boston, which developed an app for residents to report potholes. If enough residents reported the potholes in a certain area, then the city would send out workers to fix it. However, the project neglected to account for the fact that individuals with smartphones are most likely wealthier than those without. As a result, potholes got fixed in higher income areas of the city rather than the poorer parts. 
Although there will be a strong burden for those individuals impacted discriminatorily by government usage of machine learning, the task will not be impossible. Sloppy statistical work or agencies not considering all costs of using these algorithms could be presented to courts to determine whether formulated rules are arbitrary and capricious or discriminatory. This legal issue is a reminder that even if these algorithms promise to provide sound forecasting, if you put discriminatory data and assumptions into these algorithms, you will get discrimination as an outcome.
. Bolling v. Sharpe, 347 U.S. 497, 499 (1954)
. Atkins v. Texas, 325 U.S. 398, 403 (1945)
. Berk, Richard. Statistical Learning from a Regression Perspective, Springer (2016).
. U.S. Executive Office of the President, “Big Data: Seizing Opportunities, Preserving Values” (2014). http://www.whitehouse.gov/sites/default/files/docs/big_data_ privacy_report_may_1_2014.pdf
The opinions and views expressed through this publication are the opinions of the designated authors and do not reflect the opinions or views of the Penn Undergraduate Law Journal, our staff, or our clients.