Reflecting on 2023, it would be challenging to find any topic that generated more interest and concern than artificial intelligence. We started the year with a flood of ChatGPT content and the steady drumbeat around generative AI continued, culminating with the recent flood of news surrounding OpenAI's leadership challenges.
For the most part, themes of "faster," "easier" and "more efficient" dominated AI-related headlines, but if you listened closely, there was an occasional voice of warning reminding users, especially enterprise users, to be cautious and mindful of the risks associated with AI implementation and that things that seem too good to be true often are. The call for mindfulness around responsible, safe and trustworthy AI grew stronger last month with bold statements and directives from globally-influential groups including G7 Leaders, the White House, and a wide cross section of representatives from 28 countries who attended the U.K.'s AI Safety Summit.
While these actions and the surrounding commentary they generate drive increased consideration by organizations looking to leverage AI for business purposes, we still face a very real chasm of understanding. The AI sphere is complicated, untested and unclear, both from a technical and policy perspective. Devising sustainable long-term strategies around secure AI requires bringing a wide-range of stakeholders to the table, and the best foundation for such discussion is shared understanding. To help facilitate that engagement — of which we need much more — we need to get back to the basics: breaking down the definition of secure AI, the technology behind it, and where we see it successfully applied today.
While the terms AI and machine learning can be used to describe numerous use cases across a broad range of industries, the majority of these actions center around using machines and data to improve or advance the status quo. To deliver the best outcomes, AI and machine learning capabilities need to be trained, enriched and leveraged over a broad, diverse range of data sources. At its core, secure AI comes down to trust, risk and security relating to that data.
When a machine learning model is trained on new and disparate datasets, it becomes smarter over time, resulting in increasingly accurate and valuable insights that were previously inaccessible. However, these models encode the data over which they were trained. When the data sources contain sensitive information, using these models raises significant concerns as adversarial machine learning attacks designed to glean the sensitive information are low-hanging fruit. This makes any technical or security vulnerability relating to the model itself while it's being leveraged or trained a significant liability. The AI functionality that promised to deliver business-enhancing, actionable insights now substantially increases the organization's risk profile. AI users must acknowledge this risk and act to mitigate its impact.
One of the best ways to counter the increased risk associated with AI applications is to incorporate the use of privacy enhancing technologies. As the name suggests, this family of technologies enhance, enable and preserve the privacy of data throughout its life cycle. They uniquely protect data while it's being used or processed, thereby securing the usage of data. This is critical for organizations looking to capitalize on the power of AI as machine learning capabilities are data-driven — and data hungry.
By allowing data to be securely and privately leveraged in ways that were not previously possible, PETs expand the scope of possible data inputs and, in turn, enrich the model's effectiveness and broader impact: the more datasets a model can be trained over, the richer the insights it can produce. With PETs, organizations open up the possibility that they are no longer limited to using only the data assets they own or control. Instead, they are free to securely utilize third-party, open source or cross-silo data sources without exposing their own interest, intent or sensitive inputs.
To help highlight the impact of PETs, imagine a global bank wants to evaluate a machine learning model relating to customer risk over datasets in another operating jurisdiction. For entities operating in regulated industries, it is critical that such capabilities comply with regulatory guidance relating to data localization as well as personal data protection obligations while also ensuring the security of the data upon which the model was originally trained. Using homomorphic encryption, a pillar of the PETs category, to encrypt the model, the bank can safely evaluate an encrypted model across multiple jurisdictions to deliver enriched outcomes and efficiently improve decision making.
Another notable member of the PETs family, secure multiparty computation, can be used to enable cross-jurisdictional model training. In this scenario, a bank wants to train its machine learning risk model over protected datasets in another country, but needs to protect the model during the training process to ensure the privacy and security of both the data upon which the model was originally trained and the training data located in the other jurisdiction. If the model was visible during the training process, it could easily be reverse engineered to extract information putting the organization at risk of violating a potentially wide range of privacy and related requirements. This makes any exposure of the model itself a direct liability for the bank.
However, leveraging a PETs-powered encrypted training solution, that bank can safely train its encrypted machine learning model across datasets without moving or pooling data in a single location, thereby improving the bank's risk model and enabling more-informed decision making during customer onboarding.
In the health care space, PETs can be used to improve outcomes by enabling stakeholders to securely collaborate without compromising sensitive patient data. Researchers frequently need to utilize multiple, disparate datasets to effectively train AI models to monitor health data relating to drug trials. Such reports are often collated and shared asynchronously, which means the data loses currentness almost immediately and creates onerous data handling and storage problems for those responsible for reassembling and analyzing the data to inform decisions.
When PETs are used to encrypt a model, researchers can securely analyze data sources where they reside, eliminating the need to pool or centralize data, and enabling data scientists to utilize the most current data available. Furthermore, data owners maintain control over their data throughout the evaluation life cycle which minimizes the risk of data exposure or loss. Health care organizations can also generate insights and recommend action in a business-relevant timeline without exposing sensitive models or the underlying training data.
When determining how AI will be utilized, it is critical organizations consider trust, risk and security of both the capabilities and associated data. Clear policies and guidelines will be key in ensuring these considerations fall into the realm of "must have" versus "nice to have" approaches to leveraging data. Action-oriented directives encouraging the exploration and adoption of PETs, such as those included in the White House's "Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence" as well as guidance on PETs published by the U.K. Information Commissioner's Office, are steps in the right direction. Additionally, organizations need to lead by example by implementing AI strategies that are built around the principles central to secure AI.
Unfortunately, experience tells us that the push toward such practices is frequently driven by requirements or reactions rather than proactive prioritization. But the opportunity for leadership in the AI space at this moment is immense. Let's take advantage of the current focus to drive for secure AI.