Companies are incorporating artificial intelligence to perform any variety of functions, but contrary to common misconceptions, they do not do so in a legal no man's land. As U.S. agencies recently banded together to remind us, "existing legal authorities apply to the use of automated systems and innovative new technologies just as they apply to other practices." However, it is also true that some issues raised by AI systems may not be adequately addressed with existing tools in the regulatory tool box. In particular, AI systems used in employment contexts may reflect and reinforce bias in model design decisions and training data which may increase the risk of bias in their impact on protected demographic groups. Thus, while antidiscrimination laws and U.S. Equal Employment Opportunity Commission requirements still apply, the ability to evaluate compliance has become more challenging.
With that understanding, legislators are looking for ways to ensure automated decision-making systems are held accountable for their role within the hiring process. First to do so was New York City, which passed Local Law 144 in 2021, and its Department of Consumer and Worker Protection provided final regulatory guidance in April 2023. This law applies to "automated employment decision tools," and requires employers and employment agencies to perform a bias audit on the AEDT within one year prior to using it to screen candidates or employees for employment or promotion. Specifying the definitions, standards and outputs for an audit is a herculean task and not without controversy, but the work done by New York City to create the first such mandated audit process is certain to yield some valuable lessons as it goes into effect on 5 July. A summary of the results of the bias audit are required to be made publicly available online.
BNH.AI and others in the AI audit space have been busy conducting the first round of bias audits under Local Law 144. While much has been written about the legal requirements under the law, this article focuses on some of the practical considerations we have come across in our experiences performing these audits as we work with employers to operationalize Local Law 144.
The law includes a few other requirements such as public notice and disclosure, however, this article focuses on the requirement for an "impartial evaluation by an independent auditor" which must include testing with respect to EEOC component 1 categories, namely race, ethnicity and sex. Local Law 144 goes one step further than EEOC practices by including intersectional analysis..
Lack of clarity, consistency for test data
A number of issues around the law will require experimentation or regulator input to truly resolve. One of the most critical is that the DCWP rules lack specificity with respect to the data set requirements on which the audit is conducted. For example, the rules permit bias audits to use aggregate historical data in certain circumstances, but do not state what minimum sample percentage, if any, of the aggregate historical data set must be from an employer's or employment agency's historical data in order for the organization to rely on it. On its face, it would seem almost any percentage could satisfy the contribution requirement, even if the data set is not generally representative of certain organizations. Another option is to conduct the audit using test data; but the rules simply define test data as data that is not historical data and do not provide specifics.
In practice, this lack of specificity affords some flexibility in preparing the data set for the audit. Organizations may consider at least two different data sets to use. Those with sufficient historical data of their own may either rely exclusively on their own historical data or coordinate with the vendor of their AI service to use an aggregated historical data set — which also contains their own historical data — if available. Similarly, those without sufficient historical data of their own may rely on interpretations of test data or on an appropriately aggregated historical data set, which also contains any of their own available historical data.
Additional ambiguity arises due to the vagueness of the geographical scope and the time frame required for historical data sets. For instance, while companies may justifiably limit the pool to employment positions located in New York, they may need to include applicants for those positions from throughout the greater metro area, in nearby states or even further afield. Similarly, the law does not specify the time period the data set should cover, and reasonable choices might be one, two, three years or more, depending on the number of positions, applicants and desire to test a sufficiently large sample.
Importance of vendor cooperation
Many companies use an external provider for their automated decision system for applicant screening and other human resources functions. While the employing company is on the hook for compliance with the new law, the audit will often require cooperation from the vendor of the AEDT. However, Local Law 144 does not contain obligations directed at vendors of AEDTs to mandate such cooperation. Therefore, in the absence of clear legal or contractual requirements to do so, some vendors may be reluctant to provide aggregated data sets or otherwise actively participate in the audit process. This may be due to a conflict of interest, perceived potential liability, or simple unwillingness or inability to allocate the necessary resources.
Complicating this scenario further, in some cases we found vendors may not have access to certain required data — particularly race, ethnicity and sex — whether due to conflicting contractual, data sharing or privacy issues. We have encountered situations where building the necessary data set for testing necessitates contributions from multiple organizations. The AEDT model outputs may be stored by the vendor, but demographic information may only be with the employer. Commonly, these categories of sensitive data are partially incomplete as they are frequently collected on an opt-in basis.
Additional questions around implementation
Although the expected format and contents of the published results are fairly clear, the process of implementing the bias audit to assess the data and generate that content is less straightforward. We have come across several open-ended questions which require making reasonable decisions on a case-by-case basis, including the following:
- How should the auditor address AEDTs that do not squarely fit within the "selection rate" or the "scoring rate" approaches provided by the rules? For example, certain models may generate outputs that classify applicants into more than two categories, without transmitting an underlying numerical score. Should the audit retain the granular categories, create a binary classification by grouping categories in order to apply the selection rate formula, or reasonably transform the data in order to apply the scoring rate formula? These approaches may or may not be appropriate depending on the way the model outputs are used in practice.
- How should the audit "count" candidates who applied for multiple positions? In extreme cases, a candidate may have applied for dozens of positions in the same organization during a one to three-year period, and been screened by the same AEDT each time. The candidate may be assessed differently in different cases, as their skills are evaluated against a variety of position requirements. Including all instances may skew the data set but, on the other hand, manually filtering instances from the data set may be seen as cherry picking the data.
- How does the organization handle uneven distributions of missing demographic data? Because of the commonly voluntary nature of disclosing race/ethnicity and sex information, and the timing of the collection of this information in the recruitment process, many organizations are more likely to have demographic information on file for candidates who successfully proceed to be interviewed or hired, compared to unsuccessful candidates. If the success of a candidate is generally aligned with the output of the AEDT, this may translate to more missing values for demographic data in rows with lower scores or unsuccessful classifications. This, in turn, would mean a disproportionately large portion of records with lower scores or unsuccessful classifications cannot be used for testing during the audit process.
- How should organizations expect the published results of the bias audit to be evaluated by viewers? Local Law 144 is a disclosure law and does not prescribe standards for acceptable or unacceptable results. In addition, the model's recommendation outputs do not always directly correlate to job offers or hires, and many employers are already accountable under EEOC guidance and Title VII for their ultimate employment decisions. However, since the audit tables are to be made public, organizations may expect to receive scrutiny from third parties — including regulators — if the results they publish are noticeably subpar by traditional standards, such as the four-fifths rule under the federal Uniform Guidelines On Employee Selection Procedures, or other traditional disparate impact measures. In certain circumstances, in consultation with their legal counsel, an employer may decide to voluntarily include an explanation or justification for subpar results in their publication of the summary results.
Pending further guidance from the DCWP or legal rulings when the law is enforced and potentially challenged, these issues will remain open to differing interpretations or recommendations. Employers should consider, in partnership with relevant vendors and internal or external legal counsel, the best course of action for their own situation and context.