The Diversity in College Admission: a study of Asian American applicants in Harvard Judicial Case

Student: Jin Sun
Table: BEHAV1103
Experimentation location: School, Home
Regulated Research (Form 1c): No
Project continuation (Form 7): No

Display board image not available



Arcidiacono, Peter, Josh Kinsler, and Tyler Ransom. “Asian American Discrimination in Harvard Admissions.” SSRN Electronic Journal, 2020.


Espenshade, Thomas J., Chang Y. Chung, and Joan L. Walling. “Admission Preferences for Minority Students, Athletes, and Legacies at Elite Universities*.” Social Science Quarterly 85, no. 5 (December 2004): 1422–46.


“Plaintiff Expert Witness Opening Report.” In Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al., Civil Action No. 14-14176-ADB (D. Mass), Document 415-8, 2017.


“Plaintiff Expert Witness Rebuttal Report.” In Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al., Civil Action No. 14-14176-ADB (D. Mass), Document 415-9, 2018.


“Trial Exhibit DX 042. Demographic Breakdown of Applicants, Admits, and Matriculants.” In Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al., Civil Action No. 14-14176-ADB (D. Mass), 2018.


“Trial Exhibit P009. Office of Institutional Research report, ‘Admissions Part II: Subtitle’.” In Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al., Civil Action No. 14-14176-ADB (D. Mass), 2018.


“Trial Exhibit P028. Office of Institutional Research report, ‘Demographics of Harvard College Applicants’.” In Students for Fair Admissions, Inc. v. President and Fellows of Harvard College et al., Civil Action No. 14-14176-ADB (D. Mass), 2018.


Long, Mark C. “Race and College Admissions: An Alternative to Affirmative Action?” Review of Economics and Statistics 86, no. 4 (November 2004): 1020–33.

Additional Project Information

Project website: -- No project website --
Presentation files:
Research paper:
Additional Resources: -- No resources provided --
Project files:
Project files

Research Plan:


Looking at past research, it is clear that a selective college degree can be associated with success in the future, with a higher graduation rate and earnings. When talking about admissions to these universities, it can be extremely sensitive as the futures of many students are at stake. Recently, a lawsuit against Harvard about the discrimination of Asian applicants allowed us to gain full insight into the admission process. Using that data, we want to investigate the extent of Asian American discrimination during the admission process by analyzing the data and using machine learning to predict the amount of Asian American applicants that would’ve been admitted if they were white. Trying to find if there is undeniable evidence towards a biased admission process or if the situation has simply been exaggerated by the media. 


Project Purpose:

To investigate the presence and extent of discrimination against Asian American applicants in the college admissions process, employing machine learning techniques to generate counterfactual scenarios and assess the causal impact of race on admissions outcomes.



  • Data Acquisition: Collect open data from judicial documents related to the Harvard legal case. This data will include applicants' demographic information, academic achievements, extracurricular activities, and admissions outcomes.
  • Preparation and preprocessing:
  • Sample generation from the open data through bootstrap sampling to ensure accuracy and consistency.
  • Handling missing values and ensuring the dataset's integrity.
  • Segregating the dataset into training and testing sets.
  • Developing the First-stage auxiliary model:
    • Estimate a model to assess student performance as an endogenous variable, utilizing academic and extracurricular data.
    • Use predictions from this model as covariates in the main admissions outcome model.
  • Primary model development:
    • Incorporate the auxiliary model's predictions, along with applicants' racial backgrounds, into a comprehensive machine learning model designed to predict admissions outcomes.
    • Adjust the model to account for potential biases, particularly focusing on the auxiliary model's predictions.
  • Counterfactual Generation:
    • Manipulate the racial background variable to create hypothetical scenarios where applicants' races are altered.
    • Assess the impact of these changes on admissions outcomes to isolate the effect of being Asian American.

Data Analysis:

  • Results Evaluation:
    • Use statistical methods to evaluate the differences in admissions outcomes across different racial scenarios.
    • Conduct robustness checks to validate the findings.
  • Experiments
    • Experiment 1: Validate the auxiliary model by comparing its predictions with known performance.
    • Experiment 2: Evaluate the main model's predictive accuracy on the testing set.
    • Experiment 3: Conduct sensitivity analysis to understand the impact of various model parameters.

Risk and Safety:

Given that the study involves mainly publicly available information from legal decision documents, safety measure is not applicable.

Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it?

      a. Was that goal the result of any specific situation, experience, or problem you encountered?  

The goal of this project was the result of the broader societal issue of racial discrimination in college admissions processes, as highlighted by legal challenges and public debates, including the Harvard legal case. The specific focus on Asian American applicants was motivated by allegations and evidence presented in legal documents that suggested potential biases against this group in admissions decisions.

      b. Were you trying to solve a problem, answer a question, or test a hypothesis?

I was trying to answer the question whether there is empirical evidence to support the claim of discrimination against Asian American applicants in the college admission process, by quantitatively estimating the effect of race on admission outcomes through the creation of counterfactual scenarios using machine learning methods.

2. What were the major tasks you had to perform in order to complete your project?

The project involved a series of structured tasks aimed at investigating discrimination against Asian American applicants in Harvard University's admissions process, utilizing machine learning techniques and data from judicial documents related to the Harvard legal case. Here is the breakdown of the major tasks undertaken: 1) data acquisition: collection of open data from judicial documents; 2) data preparation and preprocessing: sample generation from the open data; 3) model development through two-stage predictor substitution to address the endogeneity issue of student evaluation; 4) counterfactual generation to create hypothetical scenarios where applicants’ races are altered; 5) result evaluation to estimate the differences in admission outcomes.

3. What is new or novel about your project?

      a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

Our project introduces a novel approach to analyzing discrimination in college admissions through the use of machine learning models to predict admission outcomes under counterfactual scenarios.

      b. Is your project's objective, or the way you implemented it, different from anything you have seen?

In contrast, to the studies that concentrate on analyses or broad statistical comparisons our method enables a detailed investigation into the impact of personal ratings and subjective evaluations on admission decisions for Asian American candidates with the use of machine learning to predict admission outcomes with no racial biases towards Asian American applicants.  

      c. If you believe your work to be unique in some way, what research have you done to confirm that it is?
We reveal that the Harvard admission process involves multiple dimensions of ratings beyond academic performance, such as personal ratings and are dependent on the race. We adopt a counterfactual generation approach to predict the admission outcomes for Asian American applicants, if they were treated as applicants of other races. This way would allow us to examine the extent of racial discrimination in college admissions, especially the elite colleges. 

4. What was the most challenging part of completing your project?

     a. What problems did you encounter, and how did you overcome them?

The judicial case document only reveals statistical information about the Harvard admission process for anonymity reasons, such that we will not be able to obtain individual-level application details and admission outcomes. We instead use bootstrap sampling technique to generate samples that follow the indicated statistical distributions and coefficients from regression models that mostly resembles the actual data.

     b. What did you learn from overcoming these problems?
Data privacy is always of high priority in societal problems, and solving such problem requires sophisticated statistical techniques. 

5. If you were going to do this project again, are there any things you would you do differently the next time?

The study mainly involves quantitative analysis using a data-driven approach. However, there could be many other factors that cannot be reflected in the data and yet still reveal the societal problem such as the one at hand. I could try also with qualitative approach, e.g., performing surveys to collect answers from stakeholders of college admissions, including applicants, school counselors, admission officers, etc. Such a combined mixed approach may be better at generating a more holistic picture on the problem.

6. Did working on this project give you any ideas for other projects?

Perhaps using natural language processing to analyze the language and sentiment of recommendation letters of applicants for patterns that might indicate bias based on race, gender, or socioeconomic status. This could offer insights into another subjective aspect of the admissions process. 

7. How did COVID-19 affect the completion of your project?

Since this project was mostly online it was not severely affected.