Discrimination by the Numbers: Lies, Damned Lies, and Statistics

Statistics Alone Cannot Prove Discrimination

Robert Levy is senior fellow in constitutional studies at the Cato Institute and adjunct professor at Georgetown University Law Center, where he teaches “Statistics for Lawyers.”

More than a century ago, Justice Oliver Wendell Holmes observed: “For the rational study of the law . . . the man of the future is the man of statistics.” Lamentably, our lawmakers learned their lesson only too well. Over the past three decades they have co-opted statistics, stretching its already malleable bounds to create an incoherent legal doctrine under the rubric of “disparate impact.”

As codified in the 1991 Civil Rights Act, an employee can make out a prima facie case of race or gender discrimination based on no more than a statistical disparity between the composition of a company’s work force and the pool of available workers. To defend itself, the company must then show that the disparity is job related or consistent with business necessity, whatever that may mean. And even if the company can make such a showing, the worker wins if he can demonstrate that the company rejected an alternative practice that would have resulted in a less disparate impact. Nowhere along the way does the employee have to prove that he personally was subject to discriminatory treatment.

To criticize the use of statistics in discrimination law is not to acknowledge that Title VII of the 1964 Civil Rights Act and its progeny are constitutional. Indeed, insofar as those laws are enforced against private-sector employers, they violate freedom of association as guaranteed by our Bill of Rights. Nevertheless, Title VII is the law, so let’s examine a few of the problems with “disparate impact” theory.

First, which pool of workers is relevant for comparison? Is it the general population, or job applicants, or only those applicants deemed “qualified”? Should a nationwide pool be considered, or should it be limited to a more narrowly defined geographic area? What about potential applicants who were dissuaded by what they perceive to be a discriminatory hiring practice? The possibilities are nearly limitless—and adjustable to suit the desired outcome.

Conflicting Doctrines

Second, the logic of disparate impact is fundamentally incompatible with the logic of affirmative action; yet the chief use of the former is to justify the latter. Disparate-impact theory tells us that under-representation of a protected group in the work force is presumptive evidence of discrimination. But proponents of affirmative action make exactly the opposite point. They contend that females and minorities will be under-represented even if employers are scrupulously fair. Because of past discrimination, disadvantaged groups are handicapped at the starting line. Accordingly, so the argument goes, work force composition will be unbalanced even without ongoing discrimination; only preferential programs can level the playing field.

In other words, the upshot of historical prejudice is that women and minorities do not now have a fair shot at job opportunities—regardless of how successful we have been in rooting out recurrent intolerance and bigotry. So a statistical disparity in the employment mix could occur despite nondiscriminatory practices. Of course, that doesn’t square with disparate-impact case law, which permits an inference of discrimination whenever there is disproportionate representation in the work force. The two propositions cannot coexist. Either an imbalance in the work force signifies current discrimination (disparate-impact theory) or it reflects vestiges of past discrimination (affirmative-action theory).

That’s not all. For statistics to prove discrimination, an apparent association between A (say, race) and B (say, percentage of applicants hired) must be interpreted to mean that A caused B. But it is conceivable, perhaps likely, that the relationship between A and B is not causal. Instead, both A and B may be correlated with one or more other factors, which statisticians call confounding variables.

For example, there seems to be a close association between math scores and shoe size. Yet nobody would suggest that big feet enhance mathematical ability, or that math skills cause one’s feet to grow. The obvious confounding variable is age. As people grow older, they learn more about math and they wear larger shoes.

Similarly, in assessing the correlation between race and hiring rates, it is essential to control for a long list of factors—education, experience, age, and test scores, just to name some obvious candidates. If those variables are omitted, a disparity in hiring rates may be mistakenly attributed to discrimination in the workplace when the real problem is elsewhere—for example, minorities may more likely come from broken homes or attend poorer schools, both of which contribute to lower scores on standardized tests.

Illusory Discrimination

To illustrate, here’s how the omission of a confounding variable, job preference, can lead to an inference of discrimination when there is none. The Equal Employment Opportunity Commission (EEOC) applies this so-called 80 percent rule: if the group of applicants least successful in obtaining a job has a hire rate less than 80 percent of the rate for the most successful group—a measure called the selection ratio—EEOC infers discrimination. Let’s assume that 200 of 600 women applicants (33 percent) are hired for two types of job openings; and 350 of 800 men applicants (44 percent) are hired for the same two types. The resultant selection ratio is 33 / 44 or 75 percent, thus suggesting discrimination. But suppose that most of the women applied for the more difficult of the two job types, for which only a quarter of the applicants were hired; while most of the men applied for the easier type, for which half the applicants were hired. The numbers look like this:

Difficult Job       Easy Job       Total

M       F       M       F       M       F

Applied       200       400       600       200       800       600

Hired       50       100       300       100       350       200

% Hired       25       25       50       50       44       33

Selection Ratio             100%             100%             75%

If each type of job is considered separately, precisely the same percentage of men and women was hired. But if the data for the two job types are combined, the female-to-male selection ratio is only 75 percent. Explanation: the particular job sought is a confounding variable; it helps explain the disparate selection rate, but it also correlates with the variable at issue (gender).

Unless the EEOC controls for job preference—say by calculating job-specific selection ratios—it may, in this instance, infer discrimination that is illusory. Of course, if women applied primarily for the easier jobs and men for the more difficult, then failure to control for the confounding variable could lead to an opposite but equally erroneous result—an inference of nondiscrimination despite a discriminatory hiring process. The symmetry of potential mistakes only further buttresses the case against statistical measures.

A related problem is that multiple explanatory variables may be correlated with one another, in which case the separate effect of the variable at issue—usually gender or race—will not be measurable with precision. Let’s say, hypothetically, that the probability of being hired for a particular job is a function of both gender and experience. If those two variables are themselves correlated (say, men tend to have more years on the job), then we will not be able to determine reliably the impact of gender alone.

That leaves us with a dilemma. On one hand we must include in our statistical study all of the variables that could affect the result. Otherwise we could mistakenly attribute disparate impact to discrimination when it was actually due to an omitted variable like experience. But on the other hand, if we include multiple explanatory variables, they will almost always be correlated with one another. And that makes it very difficult to determine whether and to what extent discrimination is the culprit.

Two Different Questions

There’s more. Statistical tests are designed to address a question that is totally different from the one a court faces. Given an observed disparity in, say, hiring rates between a sample of men and a sample of women, a court must determine the probability that discrimination is a cause of the disparity. But statistical tests reason in the opposite direction: they assume a nondiscriminatory environment, then assess the probability that a disparity as large as the one observed between men and women could have arisen just because a sample and not the entire population was studied.

While those two issues are related, they are not equivalent. By analogy, presupposing a fair coin, statistics tell us that the chance of two flips in a row coming up heads is one in four. That is not the same as saying that, given two heads in a row, the odds are only one in four that the coin is fair. If we know that quality control is rigorous and counterfeits are systematically removed from circulation, then we have an independent, nonstatistical basis for believing that the coin is fair.

That logical error, says Professor Kingsley R. Browne, pervades the way in which courts think about disparate impact. Browne’s insightful 1993 article in the Washington Law Review lays out the many objections to statistical proof of discrimination. He describes how courts now reason: “This disparity is very unlikely to have occurred by chance; the result is suspicious and the employer must explain it.” More properly, Browne suggests, the court should be reasoning in these terms: “The plaintiff has described statistics that would be true for thousands of nondiscriminating employers; if the plaintiff wants me to suspect discrimination, he better give me a lot more than that.”

Put another way, statistics without individualized corroboration are at best prejudicial and very possibly unjust. Consider this example from criminal law: What if the blood type of an accused rapist matches a specimen recovered from the victim? The prosecution establishes that the likelihood of a random match is one in 1,000. Is that strong evidence? It depends on how the accused was selected and whether corroborating evidence established a prior probability of guilt. If the accused met the victim’s description, was arrested nearby, then picked out of a lineup, the statistical evidence would be strong confirmation. But if the police had a data bank of blood characteristics and selected the defendant merely because of the forensic match, the evidence would be quite weak. Indeed, 1,000 people in a city of one million would have qualified.

In a nutshell, statistics are not enough. Their purpose is to show whether an observed work-force disparity might simply be due to chance, which statisticians call “sampling error.” Conventionally, when that likelihood is 5 percent or less, courts will infer that discrimination, not chance, is the reason. But that inference, by definition, will be wrong 5 percent of the time: sampling error will be the underlying reason in five cases out of every 100. Thus, many innocent employers will be held accountable for discrimination that does not exist. And the defense available to those employers under the 1991 Civil Rights Act—they may show that the disparity is job related—is utterly fruitless. There can be no job-related explanation when the disparity is a statistical artifact.

In the real world, far more than 5 percent of employers are at risk. Statistical disparities can exist for each job, each department, each factory, each geographic region, each time period, and each grouping by race, color, religion, sex, or national origin. What’s even worse, the court doesn’t deal with a randomly selected profile of employers but with companies that were specifically selected because of an observed work-force imbalance. So the probability that a court will mistakenly infer discrimination is much higher than 5 percent. When plaintiffs and their attorneys are permitted to introduce statistical evidence based on 20/20 hindsight, we should not be surprised that companies are forced to implement quota systems and adopt other class-based practices—the same practices that the anti-discrimination laws were allegedly designed to eradicate.

Whether the problem is using hindsight, mischaracterizing the relevant population, omitting confounding variables, inferring erroneous causation, applying backward logic, excluding corroborating evidence, or disentangling the separate effect of multiple factors, the conclusion is unavoidable. Dependence on statistics alone to prove discrimination is wrong as a matter of justice and ought be prohibited as a matter of law. To rewrite Justice Holmes: For the rational study of the law, the man of the future is the man who understands how statistics can be misused.


“Only by statistics, can the federal government make even a fitful attempt to plan, regulate, control, or reform various industries—or impose central planning and socialization on the entire economic system. If the government received no railroad statistics, for example, how in the world could it even start to regulate railroad rates, finances, and other affairs? How could the government impose price controls if it didn’t even know what goods have been sold on the market, and what prices were prevailing? Statistics . . . are the eyes and ears of the interventionists: of the intellectual reformer, the politicians, and the governemnt bureaucrat. Cut off those eyes and ears, destroy those crucial guidelines to knowledge, and the whole threat of government intervention is almost completely eliminated.”
—Murray N. Rothbard