Editor's note: The IAPP is policy neutral. We publish contributed opinion and analysis pieces to enable our members to hear a broad spectrum of views in our domains.

There is a deep tension in the design of generative AI chatbots and agents. The siren song of hyper-personalization pushes app developers to sacrifice privacy at the altar of utility. Whether these dimensions are, in fact, inversely correlated, the prevailing wisdom seems to be that the more privacy preserving a chatbot is, the less it will be able to do.

The ironies here are many. At a technical level, the spread of chatbots with memories is a bit of a surprise because large language models themselves are not built to remember interactions. LLMs are "stateless." They remember relational values between words, of course, but only those words used to train them. For an LLM, each output is its own independent universe. Once an LLM processes a user's prompt and generates an output, any "memory" of the interaction evaporates.

But as it turns out, verbose and apparently intelligent word generators are not as interesting when they also suffer from amnesia. So, chatbot apps are increasingly designed to remember things. Again, it is important to draw a distinction here between the core LLM itself and the app built around it with which the user interacts. 

ADVERTISEMENT

Radarfirst- Looking for clarity and confidence in every decision? You found it.

Generally, the architectural choices behind this trend are no different than they would be in any other context. Privacy and security best practices still apply. But there are also important differences. One is the sheer volume of unstructured data involved. Storing prompts and outputs as vector embeddings — essentially shorthand "fun facts" from past interactions that a chatbot can test for relevance mathematically — means personal information can become embedded deep in a system in a way that does not lend itself well to deletion. The system can become a funhouse of user inputs, inferences and conversations reflected and re-reflected in unpredictable ways.

The intimacy with which users approach their bot companions intensifies the breadth of personal information collected. We are using generalized chat systems as therapists, medical advisors and confidants. And, as the recent launch of ChatGPT Health shows, some chatbot companies are encouraging us to share more information in order to make this as relevant as possible.

It is difficult to answer privacy questions about these systems at a general level, because much depends on how they are designed and implemented. For example, usually it is only raw user inputs and final model outputs that may be stored in a vector database for access by a chatbot later. But there may also be logs of the entire interaction somewhere, including the parts that are visible to a user and the parts that are not. The hidden aspects of a chat include the nested layers of thinking or talking to itself that the chatbot does in the background in order to inject context and clarity into a user's prompt. During the course of this multistep process, chatbots will query the vector database of user-relevant data, including potentially sensitive context. All matching information is re-injected into the interaction. Depending on the retention strategy of the developer, this is likely stored in some form.

As in any digital system, stored data is subject to privacy and security breaches. If accessed, whether at the account level or deeper in the system, the potential privacy risks from these systems are extensive.

Using some chat interaction data for other purposes beyond the delivery of the chatbot itself, such as for targeted advertising, also seems to be on the table among some developers. This is the privacy risk that has recently gotten the most attention from civil society groups, including in the documentation for the model chatbot legislation released this week by the Consumer Federation of America.

The model bill, branded as the People-First Chatbot Bill and endorsed by dozens of civil society groups, proposes legislative text to address a wide range of chatbot privacy and safety concerns, particularly for children. Its proposals for privacy are notable, however, as they serve as an opening volley in the long-delayed conversation about the appropriate privacy practices of LLM-based chatbots.

As such, it highlights open policy questions related to:

  • The processing of personal data for purposes beyond the chatbot interaction.
  • The rights of a user to access their own chat logs.
  • The use of chatbot inferences for advertising purposes.
  • The "classification or designation of a user's personality or behavioral characteristics" when not expressly requested by the user.
  • Selling or sharing chat logs, including government access to chat logs.

Reasonable privacy experts will certainly come to different recommendations on these questions, but there is a need to engage on this issue while best practices are still in the early stages of development. There's another basic open question: How do we properly define the scope of data covered in any limits on chatbot data use? Given the Matryoshka doll nature of chatbot architecture, focusing simply on "inputs" and "outputs" may miss some of the risks.

Although safety issues are already on the table in dozens of chatbot-specific bills at the state level, privacy-specific language is less common. The trend is similar among the handful of bipartisan proposals at the federal level, too, including the Safeguarding Adolescents from Exploitative Bots Act in the House, and Sens. Josh Hawley, R-Mo., and Richard Blumenthal's, D-Conn., GUARD Act, which would go so far as to ban chatbot companions for minors. 

Perhaps privacy will return to the policy conversation on these transformative AI systems in the coming months. Either way, we are bound to feel the impact eventually.

Please send feedback, updates and chat transcripts to cobun@iapp.org

Cobun Zweifel-Keegan, CIPP/US, CIPM, is the managing director, Washington, D.C., for the IAPP.

This article originally appeared in The Daily Dashboard and U.S. Privacy Digest, free weekly IAPP newsletters. Subscriptions to this and other IAPP newsletters can be found here.