Courtesy pymetrics

Removing bias from hiring and opening up job opportunities to people who might not traditionally have had access to them are critical parts of resetting how organizations operate.

I’ve written before about some of the practices that make a difference, such as simple tweaks to job postings so that more women apply, and apprenticeships that provide skills and credentials that lead to good jobs.

Another promising avenue is to overhaul how we hire people, and what we’re looking for, starting with basics such as the resume. Technology can help. To understand better, I spoke with Frida Polli, CEO and co-founder of pymetrics, which uses behavioral science and artificial intelligence to reduce bias in hiring. (And yes, the company spells its name with a lowercase “p.”) Here is a transcript of our recent conversation, edited for clarity:

The most ubiquitous form of hiring is humans reviewing resumes. What's wrong with that?

There's the human part and there's the resume part; we can start with the resume part. Obviously, knowing something about someone's experience is not a bad thing in and of itself. However, there are several problems—one, it tells you nothing about what someone could do, unless it's linearly related to what they have done. So it's very limiting. It doesn't tell you anything about someone's future potential.

Secondly, and potentially more problematic, is that the resume is unfortunately rife with proxy variables. That's why Amazon got into trouble, because they had trained off of resumes that came mostly from men, and as a result of that they had flagged certain things that men do—like play baseball, not softball, and go to co-ed colleges, not all women's colleges—as being predictive. Whereas probably if you go to Barnard and play softball, you could be equally qualified. But the resume parser has learned that those are not associated with job success because they haven't hired that many women. So when you have a lot of proxy variables in your data and the resume is full of them, it can then cause problems with making predictions or decisions that are not going to then bias against particular gender, race, or socioeconomic status, if historically you haven't hired a lot of those people.

Amazon was experimenting with computers screening resumes. What are the issues with humans reviewing resumes?

The problem with humans reviewing is that we always say algorithms remove unconscious bias; but as we just mentioned, they can have other types of biases because of the training data that they're fed. The issue with humans is that it is impossible to remove unconscious bias from the human brain. In Danny Kahneman's Thinking Fast and Slow, he talks about System One and System Two thinking. System One is this fast unconscious processing, System Two is this deliberate process. System One has a lot of evolutionary advantages—if you see a tiger, you want to make an immediate snap decision to run, not deliberate for 30 minutes. However, it's also what underpins much of unconscious bias.

There was a proceedings in the National Academy of Science, a meta analysis of 30 years of human decision-making, meaning humans reading resumes. The way they audit this decision-making is they present the same exact resume, but change the names from John Williams to Jamal Washington for race, and they can change it to Emily Williams for gender. And what they found is in 30 years, ending in 2017, there has been no change in discrimination against people of color—and that for every 10 interviews John Williams gets, with the exact same resume and just changing the name, Jamal gets only seven.

The problem is simply that unconscious bias is alive and well. It's not going anywhere, unfortunately. And so we cannot assume that if we remove algorithmic decision making, we're left with this utopian counterexample; it's far from it. And again, similar findings for gender. But in this time of increased interest in racial equity, it's important to highlight the stark racial differences that occur just by thinking the person is African-American not white, even with the exact same qualifications.

This would seem to be a significant contributor to some of the patterns of discrimination in hiring.

Absolutely. At the end of the day, all of the biases we see now are not due to algorithmic bias, they're due to human bias. Of course, the problem that people are concerned about is that algorithms will further encode and scale this bias. The one thing that should be noted is that people often make the mistake of thinking that algorithmic bias is more scalable than human bias. That's absolutely false, because all brains work the same way; all brains have unconscious bias. And so it's actually false to think of an individual as having less of an impact. Yes, that one particular person will have less of an impact, but every person's brain is the same. So it's essentially a computational system whereby everyone is having that same unconscious bias. Human unconscious bias is more scalable or just as scalable as algorithmic bias.

If humans reviewing resumes is not a fair hiring mechanism, what is a fair alternative?

One last thing I should say is that there's now a decade worth of studies on the effectiveness of unconscious bias training, which I think people had hoped would be very effective, and the net result is it's completely ineffective. There's no unconscious bias training program that's been shown to work. So what we like to say is we should be trying to devise systems, not brains. What we think, we as in people that are interested in how do we improve diversity and things more broadly, is that there are methods that have been shown repeatedly to be effective.

First is just having metrics; metrics and transparency are two critical things. If you don't know what you're aiming towards and you have no transparency in how different processes are leading you to different outcomes, that's a problem. And then the third one being accountability. These are super basic things, but we don't have transparency in a lot of our hiring processes. So it's hard to know where the problems lie. And a lot of times there aren't metrics that people are driving towards, and nobody knows who's accountable. So while basic, they're super important. Those are three things you can do immediately, essentially.

The three things are metrics, accountability, and transparency?

Yes.

The last thing that we wrote about, specifically pymetrics and another technology platform called Applied, is how technology can be used in this in this way to improve diversity. And so what technology can do—and again, it's fairness-optimized technology—is that it can A/B looking at data sets that have far less proxy variables in them. That's part of what pymetrics does; by using soft skills, soft skills are far more equally distributed, and so we don't have to worry about proxy variables. Why that's important is that you can then actually train on a homogenous set of people, i.e. white men, but because those variables are equally distributed, there'll be just as likely to predict a Black woman being successful in that role, even though your training set is homogenous. Whereas that would never happen with resumes, because resumes are full of proxy variables.

So one is just to have less proxy variables in your input data, that's critical. And the second thing is to audit algorithms for bias and essentially only select those algorithms that meet certain fairness criteria. What we do is we define group fairness in hiring as the four-fifths rule. We can come up with different definitions, but that's the legal one that we use and we will not release an algorithm unless it is above the four-fifths rule, meaning that for every 10 of one group it's selecting no fewer than eight of another group. And if we refer back to our audit studies of humans, humans failed the four-fifths rule, because if you're selecting 10 people called John, but only seven called Jamal, you're failing. So even just by doing that, we're doing better than human decision-making.

That's just the minimum bar—our algorithms perform more closely to 9.5 people of color for every 10 Caucasians. Granted, the ideal would be one, right? We're not quite there yet, but we're at sort of 95%, which is we think a vast improvement over what we're seeing with human decision-making. But the point is you can fairness-optimize your algorithms and essentially not release them unless they meet those standards. Whereas you can't really do that with human decision-making. You can't audit Kevin's decision-making, you can't tell him he can't make a decision unless his decisions are meeting certain standards. It's just impossible to do that type of auditing.

Some organizations ask everyone the same questions based on the competencies required. And then they have panels of interviewers who represent different groups. Do those things help?

Yes, absolutely. Metrics can be thought of in a bunch of different ways, but when you are systematizing your processes—I would put that in the metrics and transparency category, although it's slightly different, but basically you're saying I'm going to have various standardized metrics by which I'm going to 'evaluate everyone.' And I'm going to be transparent about it and I'm going to systematize those metrics. That's what this company that I mentioned, Applied, does. Instead of saying, I'm going to just randomly interview Kevin and then Frida, and then have other people, what I'm going to do is I'm going to pose the same exact question to all folks. And I'm going to look at the first question across everybody and I'm going to have it rated in a very structured way.

That structure and putting very a good rubric around it falls into the same bucket of metrics that makes things far more likely to be unbiased, because what it does is it essentially removes Kevin's or Frida's gender, race, socioeconomic status, from their answer. Also, you could just blind people to the demographic variables of that person. It depends on how you're eliciting the answer. If it's an interview, you obviously can't do that as much. But when you are essentially extracting the information from the demographic context is very effective in reducing bias, as you can imagine. People who don't actually know demographic variables associated with the person who's providing the answers are far less likely to judge them on it.

The other thing that organizations do increasingly is they try to neutralize the language around gender and race in job descriptions or in resumes.

That's a good step. The challenge is there are so, so many variables that have that encoded. What I've heard from folks is that once you do that, then maybe you're left with very little information because you've removed, I don't know, half the resume—I'm joking, but do you know what I'm saying? That is absolutely the right way to think about dealing with resume data, for sure. It just is not without its challenges. When we started talking to organizations about the fact that we could fairness-optimize our algorithms and produce all these unbiased outcomes, people were highly skeptical and we couldn't quite understand why, because the methodology we were describing is fairly well understood. What it came to was the fact that they're like, well, sure you can do that. But then if you're removing all these variables, you must be left with no signal. Once you start to remove all the stuff from resumes you can remove proxy variables from a resume. It just then is the question of how much signal are you removing with that as well. Whereas you don't have that problem if you're starting with a group of variables that don't have that issue to begin with.

And you could wind up penalizing people who have those identifying affiliations removed, because it seems like they're doing less.

Exactly. I just don't know what the impact is going to be, that's all. So it's definitely a step in the right direction. I'm not trying to say it's a bad idea; it's a great idea. I don't know enough about that, because I just don't do that for a living. I'm just telling you what I've heard from, from other folks.

What are examples of soft skills and how do they relate to fairness and hiring the right people for jobs?

So, a little bit of backstory: I was a cognitive scientist for 10 years at Harvard and MIT and had been essentially measuring what we now called soft skills in the lab for research purposes, looking at people's cognitive aptitudes, like memory, planning, sequencing, looking at their socio-emotional aptitudes, like are you risk-averse or risk-taking, do you perform better under conditions of intrinsic versus extrinsic reward, and so on and so forth. Basically, the fundamental things that make people human, rather than what's on your resume, which is just your experience. You're trying to understand something that's more fundamental about someone and that is more likely to give you a more holistic picture of someone. Cognitive scientists have developed structured, objective and well-validated tools for doing that over the last couple of decades and pymetrics has simply patented it for use in HR.

The light bulb went off when I was watching recruiting at Harvard Business School because it became clear to me that what people were trying to understand about someone was not what was on their resume. That was obvious. It was what was not on their resume. Is this person attentive to detail or not? Is this person a team player or not, what is someone's work ethic etc. They were trying to glean those soft skills from a resume; they were saying like 'he was in the chess club, that must make him or her attentive,' and trying to infer these soft skills from hard skills, when it's easier and more accurate to just directly measure them.

That was the idea for pymetrics and that's what we built. There are other ways to do it; I'm just telling you about one. But what we look at are basically cognitive, social, and emotional aptitudes. The other cool thing about soft skills the way we measure them is there is no right or wrong. So if you're attentive versus inattentive—attentive people are good at certain jobs. And inattentive people actually tend to be more novelty seeking and creative, and they'll be good at other jobs. Same with planning: you can be a thoughtful planner or a spontaneous planner. Neither one is right or wrong. It just tells you something about what jobs you will thrive in versus not. So the value to soft skills is that they're a completely different way of looking at someone than a resume. You could have the same person with the same soft skills raised in a privileged background versus an impoverished one and you'd have very different resumes, but that soft skill profile would be identical or very similar.

They tell you something that's more holistic about someone. They're far more equally distributed than hard skills for all the reasons we just mentioned, because they're taking you out of the context of your demographic self. And they provide a lot of information that's missing from just looking at someone as a compilation of hard skills. The way we measure them is essentially these computer activities where we look at people's behavior. We didn't invent any of these computer activities; they have been created over the last several decades by cognitive scientists who look at things like memory, planning and sequencing, altruism, reward, preference, and so on. All we did was repurpose them for the use in HR. That's the way we do it. There are more traditional ways of looking at soft skills, like personality tests and stuff like that. But there are limitations to some of the more traditional ways of evaluating soft skills.

How is this different from personality tests?

Personality theory is probably 80 to 100 years old. There's one primary personality theory called the Big Five, and all personality tests essentially are variants of it. It's taught in psychology programs. It's fairly well understood, but it's a pretty dated way of looking at someone. It's not that it lacks utility. It has utility, but the way it's used as problematic. I don't know if you had a chance to see Persona, the HBO movie, but basically what it explains is that the way that these personality tests have been used is to say if you're agreeable, conscientious, non-neurotic, and emotionally stable, you basically will always be good at jobs.

And if you are not those things, you will not be. You can see that's basically a class of people that are always going to be preferred for employment versus a class that's not. That's very different from what I described to you; with all the things that we measure, there's not one that's always good or always bad. This multi-directionality and multifaceted-ness of a new way of looking at soft skills—that's what cognitive science is, a new scientific discipline. Psychology has been around since the 20th century, whereas cognitive science is more of a 21st century science. So it would make sense that we've sort of evolved in our way of thinking about people. It doesn't mean that what we did in the last century was horrible. It just means it needs a little updating.

The fact that it's multi-faceted and multi-directional—cognitive science in general is that way. It's not that pymetrics is that way and cognitive science isn't. Cognitive science is essentially celebrating diversity and saying, all people are different. It's all about understanding individual differences. We just took that approach and applied it to work. That having been said, there are more traditional ways of looking at soft skills, though they are not without problems. For personality tests, I explained what the problems are. Traditional cognitive testing would be another way of looking at soft skills, but that has a lot of racial problems. For every 10 Caucasians that pass traditional cognitive tests, only three African-Americans and five Latinos do. If you're looking for racial equality, it's hard to get it if you're using some of these methods.

A traditional cognitive test is an IQ test?

It's much more similar to just the standard IQ test, as opposed to what we do. What cognitive science does is really break down this unitary concept of IQ to say—'Hey, there's memory involved, there's planning, there's sequencing, there's task switching.' There are 13 different sub-components of IQ. When you measure IQ in the lab, you give somebody a four- hour test and there are all these sub-components, but what's happened with this older science is people have scrapped that and said, 'We just need the final number. We don't care about all those things along the way.' And what we're saying is 'No, you actually do care a lot about the individual sub-components.' And by the way, each each direction of the sub-component could be good or bad depending on the job. It's a different way of measuring people. It's much more holistic, modular, comprehensive. It's also multi-directional, which I think is the key part there.

How do you know what soft skills are required for succeeding?

I would say that they are probabilistically beneficial. I wouldn't say they're absolutely required. I don't think anything is absolutely required. Everything is probabilistically beneficial. There's no one thing which if I don't have it, I can't do the job. What we've done is we have a couple of million folks who've gone through pymetrics at this point. And within that, we can then map folks that have been in a job and their soft skill profile. It's essentially a mapping exercise. You have hundreds of O*NET codes; I think O*NET has 900 different codes. What we've done is mapped our soft-skill platform using individuals who've been successful in those jobs to those O*NET codes, if that makes sense.

Essentially, it's a data-driven exercise. It took us years to get enough data to have something where we felt good about having a market, because you do need a lot of data. You need to make sure that it is comprehensively built and have scientific validation, job relevance, and so on. We do have other additional things we've done such as a job analysis, which is a more traditional IO technique. Not only does it link to O*NET codes, but we've also done subject matter expert interviews asking what a person in this job needs—what they believe is relevant for a job or not.

Our expectation today is that people will have different jobs over their lifetime, as the nature of work evolves. Does soft skills analysis inform what other roles people could excel in and allow them to be retrained more easily?

That's absolutely right. Hard skills are fine, but it just tells you what Kevin and Frida have done in the past. If you need to do something that's totally different, it's really not going to provide much of a guide as to whether you're well-suited for that or not. Soft-skills evaluation can provide you with guidance towards careers or paths that you have never considered, certainly never done, and potentially never considered. In that way, the tool is built with group fairness in mind. It can actually equalize a lot of the gaps that we see in society.

It could recommend women for jobs that they're not represented in. It can recommend people of color for jobs they're not represented in, same with socioeconomic status. If the tool is built with group fairness in mind— fairness or lack of adverse impact, those two terms are equivalent—you can really do a tremendous amount towards equalizing those gaps. At the same time, from an individual perspective, it can provide a lot of benefits to that person in suggesting things that they may never have done before that they're well suited for. We have this program that we're running with the state of Ohio that has done exactly that and taken people that are unemployed and sometimes homeless because of the pandemic, and retrain them into occupations that have a lot more longevity, to great success. That that would be much harder to do just focusing on hard skills.

What is an example of a soft skill versus a hard skill?

A hard skill is, can you use Excel or can you program in Java—how proficient are you in some technology or something like that. It's experience-based skill. You're not born with the ability to use Excel, at least not yet. As opposed to a soft skill, which would be something that is more innate. Your memory, your planning, your sequencing, those are all things that cognitive science has developed tools to look at. Maybe not in babies, but in young children all the way to people who are octogenarians. Things that are not experience-based, they're thought to be more innate. Now granted, experience can change these things—it's sort of like the famous nature versus nurture debate. But for example, let's take attention to detail. People are generally predisposed to being more versus less attentive to detail. And there's nothing right or wrong about either end of the spectrum, but it's just something that people are more inclined to be or not be. That would be an example of a soft skill.

Just to underscore what you said earlier, soft skills are distributed equally across diverse populations?

The way that cognitive science measures them, I would say that is true. I don't know that I can say that about the more traditional types. I don't think that's actually true. All I'm trying to say is that when we think about employment science, it has traditionally been the remit of more traditional ways of looking at people. I'm not trying to be offensive or demeaning. It's just a thing. Cognitive science has historically not really been applied to employment. Those two fields haven't come together. However, cognitive science has been around for a couple decades now. It is a more novel and in some ways better way of evaluating people, just as science in general tends to improve for the better. As a result, it has provided tools that can evaluate people in a way that produces results that are more equitable—including the fact that at least the way that we designed pymetrics, we selected soft skills that we knew would be fairly evenly distributed between genders and races and socioeconomic groups.

Are there examples of organizations you've worked with where hiring became more fair?

Every organization we've worked with. It's one of our selling points. We've worked with a lot of organizations that have diversity first in mind. There are some that, quite frankly, are doing a pretty darn good job and don't need a ton of our help. They just want to maintain it. They're few and far between, but otherwise there are a lot of organizations that use us in part to find people that are better suited for the role rather than using manual human processes, but also are very keen to improve diversity. To a one we've seen that in clients that we work with.

Today the process for most organizations is they post a job. People apply, they send their resumes, they might write a letter, a human reviews it, although increasingly machines will do a first pass...

Almost no human is doing that first pass review, at least not at large organizations. At small ones, maybe, but at large organizations, for the last several decades it's been keyword searches of resumes, some sort of basic resume parser. There is very little human intervention in that first pass. Unfortunately these machines are already biased and they have been biased for a long time, and this is not a new thing that AI has brought to bear on the problem. Joe Fuller at HBS worked with Accenture on this—they looked at more traditional technology tools used at the top of the funnel and showed that they were definitely not promoting equity. If you're not optimizing that part of the process, then it is less helpful to optimize other parts because you've already made a big cut. That's very suboptimal when it comes to both hiring the right people, but also hiring fairly.

What does a more fair hiring process look like?

I believe there needs to be much, much more transparency in every part of the process. Fairness cannot co-exist with an opaque system, because if you don't know how different parts of the process are impacting gender and race and socioeconomic status and other things, I don't know how we're ever going to say something is equitable. We're just assuming it's like—I want to lose 20 pounds, but I have zero idea about the composition of my food that I'm eating. That's the equivalent statement, right? A fair process is by definition a transparent one, with fairness as a design principle. I don't think that every part of the process was optimized for group fairness. That's something we need to really focus on. There have been other types of fairness that people optimize for, but not group fairness.

To circle back around to where we started: in that fair process, what is the role of the resume?

I don't think that the resume is going to be completely thrown away and I don't think it needs to be. It just needs to be viewed with some caution that it could be introducing more bias into the process. The other thing is that there are not any great studies that I know of that actually show it predicts performance in the longterm. So it's introducing a tremendous amount of bias either because it has proxy variables or because people reading it are biased. The predictive value is highly questionable. We just have to view it with a certain amount of skepticism. We're so trained to read it, but what value is it actually providing?

One of the next frontiers we are thoughtful about is we know how disabilities are impacted by a lot of systems. We've taken a first pass at this in terms of providing accommodations, and about 7% of folks that go through pymetrics select accommodations. We ensure that the pass rates for people who select the accommodations are the same as those who don't, which I think is a good step. Disability is the next frontier and where we have to start thinking about how these platforms impact the most common types of disabilities. It's a very heterogeneous population.

We have to be mindful of how these platforms impact folks with disabilities right now. The way that we deal with it is to say, if we can't accommodate your disability, we mandate that the employer has an alternate path for you that doesn't involve pymetrics. In the future, we have to make sure that more and more types of disabilities can be accommodated by platforms in general, not just pymetrics, but just in general. We need to ensure that all technology platforms are built with disabilities in mind.


The handbook for this new era of business doesn’t exist. We’re all drafting our own as we go along—and now we’d like to start doing so together. You can sign up here to receive our briefings by email.