Artificial intelligence isn't always the best at picking the best job candidates : The Indicator from Planet Money Artificial intelligence has been portrayed as a solution to human bias. But, when it comes to finding top job talent, AI can get it just as wrong. So how can that be fixed?

For sponsor-free episodes of The Indicator from Planet Money, subscribe to Planet Money+ via Apple Podcasts or at

When AI works in HR

  • Download
  • <iframe src="" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript






And I'm Darian Woods. If you've been on the job market over the last few years and filled out an online application, you might have been judged by some kind of artificial intelligence. Your resume could have been filtered through an automated system before a human hiring manager even looked at it.

WONG: Or maybe you made it to the interview stage and instead of having a phone conversation or Zoom call with another person, you had to use a video platform and record yourself answering questions.

WOODS: Yeah, like, just speaking into the void, hoping that somebody gets the message.

WONG: Hello? Can I have a job? Hello?

WOODS: Please?

WONG: This kind of technology is supposed to help businesses with hiring - make the process more efficient or help employers make better decisions. But it's also come under scrutiny for making the hiring process more unfair when it comes to some candidates.

WOODS: Starting in July, New York City will start enforcing a law that says that employers won't be able to use automated tools in hiring and promotion decisions unless that technology has been audited for bias.

WONG: But what does that mean? On today's show, we talk to an AI expert about what can go wrong during the hiring process, and we explore whether this technology corrects or duplicates the flaws in human judgment when it comes to these decisions.


WOODS: Liz O'Sullivan has spent over a decade in the world of AI. Their first job was at a company that used this technology to match people with jobs.

LIZ O'SULLIVAN: It was a time when AI was sort of really just getting started and worked equally badly on everybody. We weren't really thinking about issues of explainability (ph) or bias at the time. We were just trying to make it work.

WONG: Are you able to give an example of the way that that kind of AI worked, that maybe by modern standards would be, like, oh, that's really clunky, but that was kind of the best that you had?

O'SULLIVAN: So in the olden days, we used to use AI simply to look for particular words, and on a resume, you might be looking for a college name or a particular degree or a particular skill.

WOODS: Right. And at first blush, this sounds pretty straightforward. So think about a hiring manager who has to sift through hundreds of job applications. The idea of software helping with some basic screening could really reduce that person's burden.

WONG: But there were problems with this simple approach of scanning for keywords. For example, there are qualified candidates out there who don't have a four-year degree, and they could get screened out despite having relevant experience.

WOODS: Or maybe the hiring software looked for a very specific job title, like administrative assistant. Someone whose resume says office manager or executive assistant might get overlooked just because their keywords weren't an exact match.

WONG: Liz says that as this AI technology developed, the models became more about figuring out patterns rather than simply scanning for words.

O'SULLIVAN: In HR, you might have a vast database of resumes that were hired and resumes that were not hired. And the kinds of qualities that you're looking for is not something you can actively specify to the model, but it just says - yes, this one was a winner; no, this one was not. The trouble is that these models are very good at divining or sort of inferring, you know, shortcuts, and they cheat a little bit.

WONG: In other words, an AI might look at which resumes belong to the candidates that got hired. It would then try to figure out what those resumes had in common, so it could get smarter at picking successful candidates.

WOODS: But this has led to problems. Several years ago, Amazon got rid of an AI recruiting tool that the company had been building in-house. The AI had looked for patterns in 10 years' worth of resumes and had decided that male candidates were better than female ones.

O'SULLIVAN: There are studies that men and women in particular and people of different ethnic backgrounds, potentially even with the same set of qualifications, describe themselves in very different ways.

WONG: For example, men tend to use more assertive language like leader and competitive. In the case of Amazon, the company's recruiting tool favored job applicants that use that kind of confident language. That's what led to the AI recommending male candidates over female ones.

O'SULLIVAN: The AI may be picking up on the bravado from a certain type of person's resume and the humility of another one, suggesting the more sort of lofty self-description as a benefit to the company as opposed to the qualifications themselves.

WOODS: The risks of using automated tools in the hiring process extend beyond gender discrimination. Take video interview software that analyzes a person's speech patterns or facial expressions. The Equal Employment Opportunity Commission says that that kind of technology could be unfairly screening out candidates with speech impediments.

WONG: Now, of course, a human hiring manager who's conducting a job interview could also look unfavorably on someone with a speech impediment, whether that manager is conscious of the reaction or not. But Liz says it's dangerous to believe that an AI will always be more impartial or objective than a human.

O'SULLIVAN: You see companies that are maybe trying to do something with AI that AI really shouldn't be let to do. In particular, we see a lot of attempts to infer things that are subjective about people - like their interests or their personalities or even potentially their emotional response to a particular job. Will they thrive from a cultural perspective?

WOODS: Liz wanted to address these problems in their work, and today they run a tech company called Vera that analyzes AI-related code for different dangers in areas like discrimination and privacy.

O'SULLIVAN: So we built a conversational assistant that helps translate code, and specifically AI-related code, into the kinds of risks that a nontechnical user might care about.

WONG: Here's how this conversational assistant - which, yes, is itself an AI - does this translating work. A customer who wants to assess some hiring software could ask - are there any privacy risks here? And the AI translator might say something like, we found some data that looks like it could be photos of faces. That might be a privacy risk.

WOODS: Liz says their company also runs tests on AI tools. Like, say I'm the head of an HR department, and I really want to avoid an Amazon situation where my hiring software is kind of just favoring men over women.

O'SULLIVAN: So one of the tests that we would apply out of the box would be one that asks - are you making more errors on women? - as opposed to - are you making fewer errors on men?

WONG: An error in this case would be if the AI guessed that someone would be a bad candidate, and it turns out that person was a great candidate. In other words, the AI made a bad recommendation.

WOODS: Liz says that this particular gender discrimination test would come up with an error rate. A high error rate on women candidates might be a sign that the AI needs tweaking.

WONG: And the error test would be just one measure of the AI's quality. Other tests might try to determine what percentage of all women applicants got recommended versus the percentage of men.

O'SULLIVAN: You can see how it gets really complicated. If you care about different groups and even intersectional groups, there can be easily, you know, dozens if not more combinations of comparison. And figuring out which ones are the most important to the organization, it's not a question you can easily solve with mathematical interventions.

WOODS: And that's why there's a human side to Liz's work, like interviewing people from sales reps to hiring managers about how AI technology actually gets used. Liz's company compiles AI audit reports for the clients that Liz says need to be updated regularly.

WONG: This is to keep up with both changes in technology and in government regulation. Like, with the law in New York that requires audits of AI hiring tools, Liz's company can perform those audits.

O'SULLIVAN: The work of responsible AI is never done, and especially when we're talking about models with near-infinite complexity, like something as large as ChatGPT - that the work of demystifying how it's doing what it's doing and identifying those problem cases, it's never-ending.

WOODS: And the never-ending aspect of that goes for both companies and government regulators that are trying to stay on top of this technology. In the case of New York City and its new law on using automated hiring tools, officials will start enforcing that law in July.

WONG: Get your robots in shape.


WONG: This episode was produced by Noah Glick and Audrey Dilling with engineering from Maggie Luthar. Sierra Juarez checked the facts. Viet Le is our senior producer. Kate Concannon edits the show. And THE INDICATOR is a production of NPR.

Copyright © 2023 NPR. All rights reserved. Visit our website terms of use and permissions pages at for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.