Towards a Fuller Understanding of the Validity of Selection tools

Sackett et al.'s 2022 review examined personnel selection methodologies. Their 2023 follow-up goes in more detail in what implications are, challenges norms, redefining the validity of common selection tools. This commentary discusses the implications, aiming to deepen our understanding of these tools in today's workforce.

In their landmark 2022 review article, Sackett et al. conducted a rigorous examination of the established methodologies in personnel selection, casting light on the operational validity of prevalent selection tools within the domain of Industrial/Organizational (I/O) psychology, reviewed in a previous insights article.

In their 2023 continuation article "Revisiting the Design of Selection Systems in Light of New Findings Regarding the Validity of Widely Used Predictors", Sackett et al focus on the broader implications of their findings which reveals intricate nuances in the application and interpretation of conventional selection instruments. By questioning long-held norms and introducing a redefined perspective on the validity of these selection tools, their study addresses a wide spectrum of professionals within the recruitment, selection, and workforce development fields.

This commentary aims to dissect and discuss some insights unearthed by Sackett et al. (2023), contemplating their widespread implications for the field of I/O psychology, HR professionals and for us as test developers. It is an examination dedicated to fostering a richer, more intricate understanding of the selection tools at our disposal, and their nuanced application within the ever-evolving landscape of today’s workforce.


The Evolution of Selection Tools’ Validity and Application

At the core of Sackett et al.'s revelations are the revised operational validity estimates of widely employed selection tools, as illustrated in the figure below. The recalibration of these estimations uncovers that job-specific predictors, such as structured interviews and work sample tests, exhibit higher validity compared to general psychological measures.

However, we reiterate Sackett et al. (2023) here that their findings on updated validity estimates “…should not be viewed as a mandate for preferring one predictor over another regardless of circumstances. Integration of multiple outcomes, such as validity, cost, time constraints, testing volume, subgroup differences, and applicant reactions, is needed for an informed decision.”. (p 288)

Many top predictors of job performance are not suitable for entry-level positions where skills are acquired through training or on-the-job experience. For instance, work samples and job knowledge tests are typically not applicable. The effectiveness of empirical biodata and structured interviews varies; while they can be adapted for both new and experienced applicants, the content must be adjusted to be job-non-specific for inexperienced candidates. Current research does not clearly indicate whether biodata and structured interviews maintain similar validity across inexperienced and experienced candidates. Also, several top predictors, like work samples and job knowledge tests, are more amenable to improvement through study and practice compared to more stable traits like cognitive ability and personality.

Keeping these points Sackett et al. highlight in mind, the below figure compares the reported mean validity estimates across three review studies, reported in Sackett et al. (2023): the original meta-analysis study by Schmidt & Hunter (1998), the revised estimates by Sackett et al. (2022), and the latest updates by Sackett et al. (2023).


Notice the difference in mean validity estimates for GMA tests across different studies? Sackett et al. clarify that differences in mean validity estimates within predictors domains (e.g. Structured employment interviews) are mainly due to different methodology for estimating validity. Sackett et al., 2022 revealed issues with how corrections for range restriction have been applied in past meta-analyses of selection tools, leading to updated operational validity estimates. Sackett et al. conclusion: Validity has previously been overstated for quite a number of traditional predictors.

But some of the difference in mean validity estimates for GMA tests especially, may be explained by the evolution of job roles and the changing definition of what constitutes effective job performance over time. Griebe et al. (2022) point out that the validity estimate for general cognitive ability by Schmidt and Hunter (1998) relied on data from studies over 50 years old. Their own meta-analysis of 113 studies from the 21st century shows an even lower average validity (Mean = .23) for cognitive ability's impact on job performance, decreasing its rank among predictors from 5th to 12th. Griebe et al. (2022) hypothesize that the decline in validity is attributed to the shift from manufacturing to a more service-oriented and team-based economy. Modern jobs require a broader set of skills, including interpersonal aspects like teamwork and citizenship behaviors, which are less reliant on cognitive ability, explaining its reduced predictive power for job performance in the current work landscape.

This finding resonates with the intuitive understanding that a comprehensive job analysis, leading to role-specific criteria, dramatically enhances the effectiveness of the selection tools. It emphasizes the necessity of adopting a more refined and job-specific approach towards the evaluation of candidates, highlighting the crucial role of meticulous job analysis, precise criteria formulation, and the strategic integration of cognitive ability assessments (e.g. how much weight the assessments is given and what the cut-off scores is) to ensure a more targeted and effective candidate evaluation process.

Navigating Between-Study Variability in Validity

Sacket et al. (2023) emphasize the need to address variability in validity beyond average values when conducting meta-analysis. While mean validity estimate often takes center stage, the standard deviation is often disregarded, leading to potential misconceptions and unrealistic expectations in practical applications of the estimated validity of a predictor. Large standard deviations indicate substantial variation, with individual studies yielding validity values that can significantly deviate from the mean.

To address this, Sackett et al. (2022) introduced the presentation of standard deviations of operational validity estimates for each predictor, indicating the amount of between-study variability around the mean estimates. So, what do they show?

Firstly, it is worth noting that the standard deviations vary significantly across different predictors of job performance. For example, contextualized personality measures exhibit negligible standard deviations, specifically measures of personality at work compared to general personality measures. However, other predictors show substantial standard deviations. Predictors such as interests have a standard deviation of .25, integrity .20, structured interviews .19, personality-based emotional intelligence .17, unstructured interviews .16, and non-contextualized conscientiousness .15.  In practical terms, this suggests that the effectiveness of the predictor can vary significantly depending on the context.

In addition to the standard deviation, Sackett et al. (2022) report the lower 80% credibility value to identify the 10th and 90th percentiles of the distribution of operational validity values. For structured interviews, which have the highest mean validity of .42, the 80% credibility interval ranges from .18 to .66. Thus, it is more appropriate to perceive the validity of structured interviews as "around .42, plus or minus .24" rather than simply stating the validity as .42. In other words, while the mean validity is .42, there is a range of values within which the true validity is likely to fall with 80% confidence. In practice, this highlights that there is a level of uncertainty and helps give a more nuanced understanding of the estimated operational validity.

This second figure, based on Sackett et al. (2022), illustrates the between-study variability in validity of various predictors of job performance. Error bars show the variability (±1 SD), while red triangles indicate the lower 80% credibility values, offering insights into each predictor's reliability.

What can explain the variability? The between-study variability in validity within job performance predictors can be attributed to various factors, including additional distinctions needed in predictor measures (e.g., contextualized versus non-contextualized), mixed adherence to quality standards in assessment center ratings, significant variations in research design (e.g., predictive versus concurrent study designs), and differences between job types and thus what constitutes high job performance. For example, intelligence tests may have higher validity for cognitive-intensive roles like engineering or IT positions compared to manual labor jobs that place less emphasis on cognitive abilities.

To effectively navigate this challenge, practitioners are advised to refrain from relying solely on mean validity estimates obtained from meta-analyses. Thorough examination of original meta-analyses, consideration of lower credibility intervals, and supplementing with individual validity studies in similar contexts can be valuable, though demanding. A practical recommendation is to be specifically aware of the variability in validity and to investigate the validity associated with the measure they intend to use. It is crucial to recognize that while meta-analyses offer a useful overview, the significance of local validity studies, with their distinctive evidentiary value for specific predictors, should not be overlooked.

Understanding the nuanced variability in validity emphasizes the need to move away from a one-size-fits-all approach. Expertise in measure selection, establishment of job criteria, and interpretation of outcomes becomes vital. This insight not only deepens our knowledge of validity but also paves the way for the development of more inclusive and effective selection systems.

Cognitive Ability in the Spotlight

Despite revisited estimates of cognitive ability tests painting a picture of reduced overall dominance, their relevance, especially in contexts requiring high learning and training outcomes, remains uncontested. However, Sackett et al.'s (2022, 2023) findings advocate for a more nuanced deployment of these tests as pointed out above. To help this, test providers have a golden opportunity to lead by integrating this knowledge. At Master International A/S we focus on integrating cognitive ability measures within a balanced selection system that appreciates compensatory traits and holistic role-fit evaluation, reinforcing the concept that cognitive ability, while crucial, is but one piece of the puzzle.

Towards a More Informed Approach in Test Provision

The implications of Sackett et al.'s (2022, 2023) work extend far beyond the mere ranking of selection tool validity. It beckons test providers and HR professionals to assume a proactive role in transforming and tailoring selection practices. This entails not just selecting the appropriate tools for the job requirements, but also using a variety of tools, and empowering users with deeper insights into their application, the importance of rigorous job analysis, and the art of criteria setting. Furthermore, the call to reexamine and appropriately apply cognitive ability measures, paired with comprehensive training, signals a move towards more sophisticated, fair, and effective selection systems.

In conclusion, the extensive insights provided by Sackett et al. (2023) touch upon various crucial aspects, a subset of which we have synthesized in this commentary. What resonates strongly is an imperative to deliberate, enhance, and reorient towards a more tailored application of selection tools in the contemporary workplace. At Masters International A/S, we earnestly welcome these insights and embrace the responsibility. Our commitment is not only to ensure that personnel selection methods are scientifically robust but also to support tailored and sophisticated selection processes, meticulously crafted to address the diverse needs of today's workforce.




Griebie, A., Bazian, I., Demeke, S. Priest, R., Sackett, P. R., & Kuncel, N. R. (2022). A contemporary look at the relationship between cognitive ability and job performance [Poster]. Society for Industrial and Organizational Psychology Annual Conference, Seattle, WA.

Sackett, P., Zhang, C., Berry, C., & Lievens, F. (2023). Revisiting the design of selection systems in light of new findings regarding the validity of widely used predictors. Industrial and Organizational Psychology, 16(3), 283-300. doi:10.1017/iop.2023.24

Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 107(11), 2040–2068. https://doi.org/10.1037/apl0000994

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological bulletin, 124(2), 262.

Date: 26.02.2024

Solja Klargaard

Content Writer