Does Stress Impact Technical Interview Performance?

While technical interviews have many problems, see: https://blog.usejournal.com/hiring-is-broken-what-do-developers-say-about-technical-interviews-21821141ca71. One reoccurring theme is the feeling of anxiety and stress associated with coding under the surveillance of the interviewer.

In a recently accepted paper, “Does Stress Impact Technical Interview Performance?” by Mahnaz Behroozi, Shivani Shirolkar, Titus Barik, and Chris Parnin we found that rather than assessing problem-solving ability, technical interviews may simply be a procedure for identifying candidates who best handle and migrate stress solely caused by being examined by an interviewer (performance anxiety).

We also found a simple fix: The private interview. Read on understand our study and our recommendations on how to fix technical interviews.

Technical Interviews are Stress Tests

Most companies in the software industry administer a technical interview as a procedure for hiring a software developer [Aziz, McDowell]. Companies believe technical interviews offer a “reasonably consistent evaluation of problem-solving ability, communication skills, and preparedness’’ [Wheeler]. Technical interviews can also give visibility to the personality of the candidate, how they interact with their future colleagues and how much they pay attention to details — such as checking all possible test cases — and knowledge of programming languages. Companies also find it important to reduce unwanted stress [Behroozi] during a technical interview, as “not everyone does their best work in fast-paced, high-pressure situations’’ [Montgomery]. In principle, companies should expect the technical interview process to be a sound and fair assessment of a candidate’s ability, and thus yield a steady stream of qualified candidates for positions that open up in the company.

Technical interviews can also introduce other effects on candidates who report unexpectedly “bombing’’, “freezing’’, “choking’’ during this critical hiring procedure. Through a happy accident, the software industry has seemingly reinvented a crude yet effective instrument for reliably introducing stress in subjects, which typically manifests as performance anxiety [Wilson]. A technical interview has an uncanny resemblance to the trier social stress test, a procedure used for decades by psychologists and is the best known “gold standard’’ procedure [Allen] for the sole purpose of reliably inducing stress. The trier social stress test involves having a subject prepare and then deliver an interview-style presentation and perform mental arithmetic, all in front of an audience. Alone, none of these actions consistently induce stress in a subject; however, the unique combination of cognitive-demanding tasks with a social-evaluative threat (essentially being watched) is consistent and powerful. If a technical interview is essentially a de facto version of a trier social stress test, then the implications can be profound. Rather than measuring the few that answer correctly in a timely manner, companies are most likely measuring the ability of the few who perform well under stress. Rather than measuring explanation skills, companies are most likely measuring the ability of candidates to handle or mitigate stress (e.g. through practice. Finally, rather than avoiding unwanted stress, technical interviews may be inadvertently designed with the sole purpose of inducing it.

To understand how to maintain the desirable goals of a technical interview (e.g. measure time and correctness), while mitigating the undesirable effects of stress, we created a design probe which removed the social-evaluative threat component of a technical interview. To this end, we designed an interview format where participants privately solved a technical problem on a whiteboard, without any monitoring or interaction with an interviewer. We evaluated this format in a randomized controlled trial to understand the impact of stress during a technical interview and whether we could isolate and dampen its influence. We then compared this to a typical interview format, where a candidate used talk-aloud to explain their problem-solving in front of a proctor. Participants wore specialized eye-tracking glasses, which allowed us to obtain measurements associated with high cognitive load and stress. We then compared the correctness, self-reported experiences, and cognitive load and stress levels of participants across the interview settings.

We found that participants reported high-levels of stress and often had difficulty coping with someone else being present while they attempted to solve a cognitively-demanding task. The impact on performance was drastic: nearly twice as many participants failed to solve the problem correctly, and the median correctness score was cut more than half, when simply being watched. In contrast, participants in the private setting reported feeling at ease, having time to understand the problem and reflect on their solution. Finally, measures of cognitive load and stress were significantly lower in the private setting, and majority of participants solved the problem correctly. In a post-hoc analysis, we also observed that no women successfully solved the problem in the public setting, whereas all women solved it correctly in the private setting. Measures from the standard NASA-TLX procedure and various cognitive metrics from the eyetracker, provided evidence that higher rates of stress could causally explain these differences across groups.

We suggest that private interview settings have considerable advantages for candidates, that both reduce their stress and allow more accurate assessment of their problem-solving abilities. The implications of this work include several guidelines for more effective administrations of coding interviews. By offering the ability to solve a problem in private, even if only for an initial few minutes, it could substantially increase the amount of qualified candidates, particularly in traditionally underrepresented groups, entering the workforce. Furthermore, we can expect these changes to minimize or mitigate other problematic effects experienced by candidates —such as stereotype threat — that impact interview performance but are orthogonal to assessing candidates’ actual problem-solving abilities.

Findings

Our findings demonstrate that stress impacts technical interview performance; indeed, in our study, participants in a traditional technical interview format:

  • obtained significantly lower scores,
  • experienced significantly higher extraneous cognitive load, and
  • experienced significantly higher stress levels

Furthermore, participants reported feeling “very nervous’’, “rushed’’, “stressed’’, “monitored’’, and “unable to concentrate’’ as a result of being watched. Participants also reported extraneous cognitive load associated with having to “think and talk and do code at the same time’’ (P39). When not being monitored, participants were more likely to perform mental execution, allowing them an opportunity to evaluate and build confidence in their solution before having someone examine it.

For the full details of the methodology and results, we invite you to read through the full pre-print pdf.

The case for the private technical interview

In the remainder of this section, we present interim guidance for future directions on technical interviews — consider the recommendations as hypotheses that need further evaluation rather than outright policy.

Guidance I — Use retrospective think-aloud for accessing explanation skills

Although companies want to accurately assess candidates based on their actual skills, they can inadvertently favor a candidate’s ability to handle or mitigate stress. In most technical interview formats candidates are asked to think-aloud. Think-aloud protocols are methods to add visibility to cognitive process of a candidate while doing a set of specified tasks rather than only evaluating their final product. There are two main types of think-aloud methods: concurrent think-aloud and retrospective think-aloud.

In concurrent think-aloud, candidates explain their thoughts in tandem with doing the task. Just as subjects given the trier social stress test, candidates who must vocalize their thought-process in real-time risk exposure to a social evaluative threat that can hinder performance in both explanation and problem-solving [Dawson]. Van Den Haak et al. also showed that verbalizing thoughts at the time of performing tasks generally increase errors and impede task completion. In retrospective think-aloud, candidates first finish the task and then walk the interviewers through their thought process. That is, candidates can perform the task in their own manner with limited impact on performance. To implement retrospective think-aloud in technical interviews, interviewers can first provide the candidate with a problem description and allow them to privately solve the problem, followed by an explanation of their thought process and a discussion about their solution with the interviewer. This interview format mitigates many of the issues our participants experienced from feeling monitored or supervised, while giving companies an opportunity to still evaluate explanation skills.

Another way to effectively assess explanation skills is to use interview formats that simultaneously reduce the social-evaluative threat and while increasing focus on the explanation component of the assessment. For interviewers who want to assess how the candidate would communicate with other members of the team, the technical problem itself is primarily a shared vehicle through which the interviewer and candidate engage in a conversational dialogue. For example, Microsoft has attempted to reframe technical interviews as as more of a conversation in which both the candidate and the interviewer work together to solve the problem. We recommend starting with straight-forward problems that are no more difficult than first-year computer science exercises, and then using these exercises as a way to progressively elicit explanations about their knowledge and experience on different topics. For example, consider the Rainfall Problem, a programming task that has been used in a number of studies of programming ability. If the candidate does well in their explanation, the interviewer can probe more complex scenarios, such as scaling this problem to a distributed algorithm, or building a user interface for such a system, depending on the expectations of the position. In certain scenarios, it may not even be necessary to fabricate a problem to drive the discussion. At Netflix, some interviewers ask the candidate to “teach something that they know,’’ and the candidate can choose any topic of their interest, job-related or otherwise.

Guidance II —Evaluate the kinds of stress necessary for position

Although companies have been mindful to eliminate the influence of stress from their assessment procedures, some view stress as an important characteristic of the job, and should not hire candidates who cannot manage stress: “the real problem is in your head: your anxiety about job interviews is sabotaging something you’re otherwise perfectly good at’’ [Behroozi].

In this case, it is important to delineate the kinds of stress a developer would typically encounter in a day-to-day manner. Given important tasks that must be completed in a timely manner, the ability to tolerate stress from time pressure is a reasonable consideration. However, time pressure is distinct from other sources of stress [Orfus], which might manifest during an evaluation but not during day-to-day tasks, such as performance anxiety, commonly called “stage fright’’, or closely-related “test anxiety”. Therefore, with the private interview format, it is still possible to assess the ability for a candidate to perform under time-pressure, without evaluating other sources of stress.

If stress is an important consideration for the job, companies should consider ways to help candidates mitigate its effects, such as stress inoculation training. As such, interview candidates have been advised to practice (or “grind’’) on various problems and solutions in order to become immune to its effects —Mekka Okereke, a senior manager at Google recommends doing at least 40 practice sessions. While stress inoculate training can be effective, developers also note the immense time commitment [Behroozi] required to train for technical interviews and the disparity caused by some candidates not having the same available time or resources to train as others [Ogbonnaya-Ogburu].
Honeycomb, a women-lead DevOps company, tries to eliminate stress by giving advanced details: “No surprises… unknowns cause anxiety, and people don’t perform well when they’re anxious. We willingly offer up as much detail in advance about the process as we can, even down to the questions.” Yet, there may be some companies that still need to assess candidates on their stress-tolerance. For them, handling stress may be even more important than candidates’ technical skills. In that case, companies should be clear about that requirement. This let candidates to decide whether those companies are suitable for them or that is the culture they want to fit in.

Guidance III — Provide accessible alternatives

We observed several instances where the nature of the whiteboard interview interfered with the candidate’s ability to perform their task. Several changes to the interview procedure can reduce the effects of stress and cognitive load.

Warmup interview. The technical interview format is substantially different from how developers do their day-to-day programming activities. To reduce stress, provide the candidate with a warmup exercise that gives them an opportunity to familiarize themselves with the interview setting, experience the interview format, and ask questions about the technical interview process. Ideally, the warmup interview should be conducted by an interviewer who does not score the candidate.

Free reset. As we found in our study, in stressful situations some participants may forget or “blank’’ on a particular solution. Interviewers should acknowledge that interviews are stressful situations, and allow the candidate a “free reset” to request another problem or start again on the existing problem without incurring any penalty. Having such a safety net can reduce the candidates’ stress and improve their performance in the technical interview. Similarly, dropping the lowest performance interview can reduce noise associated with problematic questions or unfavorable interviewer–candidate interactions.

Partial program sketches. If you’ve ever had to write a paper, you know that starting from an entirely blank page can be both difficult and intimidating. Participants in our study also reported additional stress from having to write a program from scratch, for example, from having to recall the syntax of the language without having any cues. Instead of asking participants to write a program from scratch, interviewers can provide participants with an initial skeleton that contains a partial solution, such as the method signature, a sample invocation, a few input and output examples, and even some sample code snippets for the programming language.

Familiar affordances. Conducting technical interviews on the whiteboard can also unnecessarily increase the cognitive load for the candidate, especially if the candidate does not routinely write on a whiteboard. To mitigate this, offer the candidate the option to use a laptop, or allow them to solve the technical problems using pencil-and-paper. By doing so, candidates do not have to write code on an unfamiliar medium just for purposes of the interview. Although not well-advertised, interviewers at Google can elect to use pencil-and-paper to conduct their interviews instead of a whiteboard.

Guidance IV —Consider impacts on talent and diversity

As companies embrace diversity and strive for inclusive hiring practices, companies should evaluate how their technical interviews support or detract from that goal. Hiring procedures that inadvertently exclude large segments of population can contribute to “leaky pipelines”[DebuggingHiring], with impact to increased hiring costs and a disproportionate reduction in hiring of minorities and other underrepresented groups. For example, a large population of people are impacted by performance anxiety (estimated 40 percent of all adults in the U.S.). Collectively, otherwise qualified candidates — who happen to perform poorly due to performance anxiety — could be excluded from fair consideration for hiring. Furthermore, scientific evidence finds that women experience disproportionately more negative effects from test and performance anxiety [Else-Quest], which could explain our observations.

As we observed in our study, candidates who perform in traditional interview setting are more likely to fail, but not necessarily for reasons related to problem-solving ability. As a result, how an individual responses to stress and extraneous cognitive load can be driving hiring decisions instead of ability. For example, if two candidates performed equally well, the job may still go to the candidate who better projects confidence [Ford]. Beyond gender, these procedures could further impact the performance — and thus their exclusion —of other demographics, such as high-anxiety individuals and neurodiverse (e.g., dyslexia or autism) job seekers. Even wider bands of demographics, such as disadvantaged and low-resource job seekers can be impacted by unwanted stress in hiring procedures. For example, Mekka Okereke, a Google senior manager speaking at a recent “Is the Technical Interview Broken?” panel, notes that students at Stanford take courses for passing technical interviews, CS9: Problem-Solving for the CS Technical Interview. Mekka finds that most students typically lack these resources, and specifically runs workshops at HBCUs to provide interview training. Companies must decide at what cost they are willing to pay for verifying explanation skills in tandem with problem-solving ability, and what impact that has on their ability to hire diverse and talented candidates.

Conclusion

Our study raises key questions about the validity and equity of a core procedure used for making hiring decisions across the software industry. We suggest that private interview settings have considerable advantages for candidates, that both reduce their stress and allow more accurate assessment of their problem-solving abilities and can be easily extended to allow assessment of communication skills through retrospective think-aloud. Although this study is one of the first to provide insights into impacts of stress on technical interviews, we have only examined this effect in the context of one coding challenge, from participants from one University. A larger collection of studies, including active participation by industry to pilot alternative hiring procedures, would be valuable for informing how to create a valid and inclusive hiring process for all.

That’s where you come in. Help us change this experience for everyone.

Assistant Professor in Computer Science