Researchers discovered that ChatGPT consistently ranked resumes with disability-related honors and credentials lower than those without such mentions. However, when researchers gave specific instructions to the tool to avoid ableism, the bias decreased for most disabilities tested, except one.
While searching for research internships, University of Washington graduate student, Kate Glazko, observed recruiters mentioning their use of OpenAI’s ChatGPT and other AI tools to assess resumes and evaluate candidates. Automated screening has been a common practice in hiring, but Glazko, a doctoral student in the UW’s Paul G. Allen School of Computer Science & Engineering, focuses on how generative AI can replicate and amplify biases against disabled individuals. She questioned how such a system would rank resumes hinting at a disability.
In a recent study, UW researchers found that ChatGPT consistently rated resumes with disability-related honors and credentials lower than identical resumes without such mentions. The system provided biased explanations for these rankings, for example, indicating that a resume with an autism leadership award had “less emphasis on leadership roles,” perpetuating stereotypes about autistic individuals.
When researchers instructed the tool to avoid ableism, the bias decreased for five out of six implied disabilities including deafness, blindness, cerebral palsy, autism, and the general term “disability,” with three out of these ranking higher than resumes that did not mention disability.
The team unveiled these findings at the 2024 ACM Conference on Fairness, Accountability, and Transparency in Rio de Janeiro on June 5.
“The practice of ranking resumes using AI is becoming more prevalent, but there is insufficient research on its safety and effectiveness,” said Glazko, the lead author of the study. “Disabled individuals often have to decide whether to include disability credentials on their resumes, even when humans are the reviewers.”
The researchers utilized a publicly available curriculum vitae (CV) belonging to one of the authors, comprising around 10 pages. They then generated six modified CVs, each insinuating a different disability by incorporating four disability-related credentials: a scholarship, an award, a diversity, equity, and inclusion (DEI) panel seat, and membership in a student organization.
Using ChatGPT’s GPT-4 model, the researchers compared these modified CVs with the original version for a real job listing as a “student researcher” at a major U.S.-based software firm. They repeated each comparison 10 times, and out of 60 trials, the enhanced CVs, varying only in implied disability, ranked first only 25% of the time.
“In a just world, the modified resume should consistently rank first,” noted Jennifer Mankoff, a UW professor in the Allen School and senior author of the study. “In any job setting, an individual recognized for leadership skills should rightfully be positioned above someone with similar qualifications lacking such recognition.”
When researchers queried GPT-4 about its rankings, its responses displayed explicit and implicit ableism. For instance, it commented that a candidate with depression had “more focus on DEI and personal challenges,” which “distracts from the core technical and research-oriented aspects of the role.”
“Some descriptions by GPT generalized a person’s entire resume based on their disability, suggesting that involvement in DEI or disability might detract from other aspects,” Glazko highlighted. “For instance, it imagined the concept of ‘challenges’ in the comparison involving the depression resume, despite ‘challenges’ not being explicitly mentioned, revealing underlying stereotypes.”
To explore the possibility of mitigating bias, researchers turned to the GPTs Editor tool, allowing them to tailor GPT-4 with written guidelines without requiring any coding. These guidelines directed the chatbot to avoid ableist biases and instead operate according to disability justice and DEI principles.
They conducted the experiment again using the newly trained chatbot. In total, this system ranked the modified CVs higher than the original CV in 37 out of 60 instances. Nonetheless, for certain disabilities, the improvements were minimal or absent: the autism CV ranked first only three times out of 10, and the depression CV only twice, remaining unchanged from the initial GPT-4 results.
“It’s crucial for people to be conscious of the system’s biases when utilizing AI for practical purposes,” Glazko emphasized. “Without this awareness, recruiters leveraging ChatGPT may not be able to rectify these biases, or comprehend that bias can persist even with guidelines in place.”
The researchers pointed out that some platforms like ourability.com and inclusively.com are striving to enhance outcomes for disabled job seekers who face biases regardless of the use of AI in hiring. They stressed the necessity for further research to identify and rectify AI biases, including examining other systems like Google’s Gemini and Meta’s Llama, incorporating additional disabilities, analyzing how the system’s bias against disabilities intersects with other characteristics such as gender and race, exploring whether further customization could consistently reduce biases across disabilities, and investigating the feasibility of making the base version of GPT-4 less biased.
“It’s vital to scrutinize and address these biases,” Mankoff added. “We’ve gained insights from this study and hope to contribute to broader discussions not only concerning disabilities but also other marginalized identities, to ensure that technology is implemented and deployed equitably and fairly.”