Olivia Solon recently wrote an excellent article in the Guardian covering the use of “pseudo-AI” by tech startups. Pseudo-AI, or human workers performing work eventually intended for an artificial intelligence or supplementing an AI still under development, is a common prototyping practice, necessitated by the inherit difficulty and large datasets necessary to create an AI.
Ideally, a system utilizing a pseudo-AI is never exposed to the public, however, as agile software development approaches become more popular, this is less true. As Solon points out, this has increasingly led to cases where workers at a startup are reading the receipts and emails of unaware people while performing data transfer tasks and speaking with them by telephone while impersonating a robot; sometimes in direct violation of stated privacy policies.
As this story suggests, established privacy professionals should vet the current development status of AI startups they contract with very carefully. As research from Osaka University has shown, people are more likely to disclose information, especially sensitive information, to an AI than to a human being. The revelation that human beings are regularly performing work customers are lead to believe is automated can have major trust and public-image ramifications, even if the primary service-providing company is unaware.
Additionally, there are numerous legal ramifications.
In addition, civil penalties may result under HIPAA if the data being accessed is health data and proper safeguards to protect the data from disclosure are not being taken; which is likely to happen in pseudo-AI where the contracting company is unaware that the work isn’t fully automated.
Similarly, if the data is financial data covered by the Gramm-Leach-Bliley Act, the company may face action for violation of the safeguards rule of that act from a federal regulator. Similar things can be said for educational records under FERPA. There are also ramifications in the European Union under several provisions of the General Data Protection Regulation. Although, like U.S. law, the GDPR is agnostic regarding how data is processed, not necessarily barring the processing of data by hand; violations of professional codes of ethics around sensitive data, poor security procedures and vetting, or violating privacy statements can also lead to enforcement actions.
The use of humans to do work that should be automated violates international standards for payment card processing and may also incur a tort suit for violation of privacy if the data is sufficiently sensitive and users were deceived.
There are several techniques companies could utilize when contracting with an AI startup to limit their liability.
The first, and most important, are traditional contract methods of liability shifting, including indemnification for privacy violations and warranties regarding compliance with the company’s privacy standards and the use and scope of AI used in the startup’s products. A warranty that the company is actually using AI as described, rather than human workers, is heavily advised. Additionally, due diligence of a prospective AI scheme, including a review by some of the company’s subject matter experts, should help to discriminate between AIs and pseudo-AIs.
In some situations, using a pseudo-AI is desirable for a business. One such example is machine learning (also called data mining), especially when developing a model to be used for prediction. Predictive machine-learning models are developed using a process called supervised learning, where the model is developed and tested against a set of already-processed and determined sample data. To gather sample data for use in refining the model, and to check and correct mistakes made by the automated system while it is still in development, employing human workers in a pseudo-AI scheme is essential.
There are methods of preserving customer expectations and privacy rights while still employing a pseudo-AI.
The first and most common of which is waiver, where users are fully informed of the pseudo-AI nature of the system. This approach is commonly used in a “closed beta” test of the software product, in which the system is offered to a select number of early users, either employees or public volunteers, who are charged with using it while offering feedback. After the data mining model has reached an adequate state of accuracy, it will be opened to the public, usually with the pseudo-AI elements stripped.
The second is by adapting publicly available data or artificially generated sample data to create fake input for the system under development. Although this method does not involve outside users, and is thus more desirable from a privacy standpoint, it implicates significant accuracy concerns, since the generated data may not be an accurate representation of actual user input. Depending upon the nature of the system in development and the types of public data available, this approach may not be available all the time.
One of the major trends in software development over the last decade has been the increase in an agile model, where products are released before they are fully finished and then refined through user feedback. As agile becomes more popular, the use of pseudo-AIs to fill in gaps will only grow, and privacy professionals and statements will have to keep up.
If you want to comment on this post, you need to login.