When asked to explain the privacy concerns of generative artificial intelligence apps, ChatGPT listed five areas — data collection, data storage and sharing, lack of transparency, bias and discrimination, and security risk — each with a brief description.

"Overall, it’s important for companies to be transparent about the data they collect and how they use it, and for users to be aware of the potential privacy risks associated with AI chatbots," ChatGPT said.

And it was not wrong.

These are among the concerns being discussed and watched closely by privacy professionals and advocates as generative AI tools, like ChatGPT, have skyrocketed to popularity in recent weeks.

Launched by parent company OpenAI last fall, ChatGPT is one of several algorithmic-based generative AI systems, including Dall-E, Stable Diffusion, as well as Google’s recently released Bard and Microsoft’s Bing chatbot. While some generate text-to-text responses, others produce audio or visual content. ChatGPT and Bing, in particular, have captured widespread attention for their ability to interact with users and answer questions in conversational, detailed ways.

"Where we are right now is a new frontier, a new breakthrough, in AI," said Chloe Autio, director of policy at the Cantellus Group, a boutique advisory firm focused on AI policy and governance. "The new frontier is this generative capability, coupled with this usability concept, which is that anyone can now access the power of these tools by typing into these easily accessible interfaces and, because of the mass and scope of the data these algorithms have been trained on, the possibilities for their use and access have just really exploded."

Data protection, privacy rights 'front of mind'

The quick, human-like responses and interactions of generative AI apps give off a sort of "magical" quality, Hintze Law Associate Jevan Hutson, CIPP/US, said. The reality is massive amounts of data are behind making that magic happen.

Plus developers of generative AI have not been forthcoming about what the data entails or exactly how it is being maintained and used, Privacy and Data Policy fellow at the Stanford Institute for Human-Centered Artificial Intelligence Jennifer King said.

"It’s challenging to know if your personal data has been sucked into their training models," she said. "Certainly, they are crawling a lot of data and when I talk to our computer scientists working on machine learning I hear a kind of Wild West. There’s a lot of data being crawled and there’s not necessarily practices, policies, procedures, documentation in place to know where things are coming from. I think a lot of the data protection and privacy rights questions are front of mind."

Generative AI models are trained by scraping, analyzing and processing publicly available data from the internet. Autio said OpenAI reportedly extracted 300 billion words from the web — think open social media profiles, blog posts, product reviews — to train ChatGPT to determine how questions are asked and how words would likely be phrased together in a response.

"They generate text, image and audio outputs based on large amounts of data they are trained on, using machine learning, natural language processing and algorithmic processing," Or-Hof Law founder Dan Or-Hof, CIPP/E, CIPP/US, CIPM, FIP, said of generative AI technologies.

"A generator neural network creates outputs based on a request, while another network type, the discriminator, evaluates the data by trying to distinguish between real-world data and data generated by the model. Interplay between the two types results in the generator improving its outputs based on feedback received from the discriminator without human intervention. Arguably, the output of generative AI machines is new and original content, rather than a derivative of the data sets on which the machines train on."

An 'open road of data'

When Or-Hof asked ChatGPT to reveal everything it knew about him, the chatbot responded with information about his law practice and credentials — no personal, sensitive or intimate data. But the technology requires "masses of data to train on," which could include identifying information like headshots or personal details revealed in online text, he said.

He raised questions around transparency, consent, lawful grounds for data processing, and noted a growing debate on whether AI data processing requires prior permission.

"How will individuals receive a genuine and accessible notice that data related to them is used for machine learning? This is not a trivial question to answer," he said.

"Must collecting personal information to train machines be based on individuals’ consent, or can it be based, for example, on legitimate interest as a lawful ground?" he asked. "Can the generative AI processing of special categories of data governed by the EU General Data Protection Regulation be acknowledged as necessary for scientific or statistical purposes? Would AI-processing be regarded under the California Privacy Rights Act as a noncompatible purpose which requires a separate consent?"

Relatedly, individuals’ ability to exercise their privacy rights, and the ability of generative AI applications to comply with such requests, is another challenge, Or-Hof said.

"Presumably, the rights of access and deletion could be manageable. Conversely, including a human in the loop to effectuate the GDPR right not to be subject to automated decision making, or limiting the use of sensitive personal information as required under the CPRA, are harder to implement," he said.

Describing it as an "open road of data" that pre-dated ChatGPT, Autio said the concept of using publicly available information to create data sets without consent is not new. With generative AI, what has changed, she said, is "the analysis, inference and processing of data is being used for content generation, to automate a much broader range of more tangible tasks," and eventually, commercialization.

Earlier this month, ChatGPT announced a paid subscription version for $20 a month.

"These generative AI tools are being used for commercial purposes and the data subjects, whose data has been used to train these systems, may be completely unaware. That’s a big privacy and transparency issue," Autio said.

"But these dynamics are not new with generative AI. It’s not just an AI or a ChatGPT problem, but it’s sort of a general privacy and consumer awareness problem as well. So many people are not aware what they put on the internet or into these systems — like questions to or outputs generated by ChatGPT — can be a window into the pattern of people’s lives. The outputs or results people get from these tools first reflect, and then can become part of, patterns that inform someone else's future results. This is not to say their personally identifiable information is directly revealed to others, but the patterns and data might be used with only basic consents — if any." 

Processing of data sets containing personal information may also create inaccurate, biased, fake or abusive outputs, Or-Hof said, each of which "can result in serious damage to the privacy of the respective individuals."

As users feed data to generative AI systems through queries and interactions, the information is uploaded, processed and stored, Or-Hof said. Users could inadvertently or knowingly upload sensitive information that then becomes part of the app’s data set. ChatGPT’s policy, he said, clearly indicates data is used to "provide and improve the service" and many of the apps include disclaimers urging users not to share sensitive information.

"Given the extent of sensitive information that individuals share across social networks and other online services, it’s very likely they will share the same with generative AI services, even if warned not to," he said, also noting data used by generative AI systems could be "highly lucrative for hackers and therefore susceptible to cyber-attacks."

AI a 'prominent point' for regulators

Hutson said generative AI is likely to be a "prominent point" for regulators looking to "illustrate how the law applies.”

"And given the risks, it’s likely not to be pretty," he said. "Regulators are thinking about this. They are interested in it. They are meaningfully and reasonably concerned about the possible harms both to individual’s civil rights, consumer protection, general forms of civic integrity."

In the EU, lawmakers are crafting the AI Act, which they hope to finalize by the end of 2023. A new category was recently added to the draft bill to cover generative AI systems and consider AI-generated text, that could be mistaken as human-made, high risk.

In the U.S., the White House Office of Science and Technology Policy published a “Blueprint for an AI Bill of Rights,” providing design, development and deployment guidelines for AI technologies. 

The U.S. National Institute of Standards and Technology released a voluntary AI Risk Management Framework to help organizations deploying AI systems enhance their trustworthiness and reduce biases, while protecting individuals’ privacy.

The California Privacy Protection Agency also announced upcoming rulemaking on automated decision making under the CPRA.

Hutson urged caution for privacy practitioners, reminding them while generative AI is "flashy and new" it does not change existing obligations or the landscape of obligations to come.

"Generative AI is just the newest thing regulators are going to focus their eyes on that is going to further reveal the need for responsible and prudent data governance early on in the process, not after the fact," he said.

That shows the continued need for privacy professionals — from data privacy attorneys to data engineers to information security professionals — is not going away.

"Everything we have been studying and preparing for as data privacy professionals applies here and applies times 100," Hutson said.

"This is an area that is going to continue to change daily as we see new technologies that are able to do new and cool things people want to use. Fortunately, and unfortunately, they are going to use mountains of data, requiring folks like us to pay attention and to understand how these technologies work, but also to strategically advise on how to minimize risks and meaningfully protect consumers — both in terms of privacy and security but also in terms of ethics — in response to these systems that have the opportunity to create lots of novel things, but also things that are genuinely kind of scary."