In recent months, the focus on artificial intelligence shifted to generative pretrained transformers that rely on large language models and tools, such as OpenAI's ChatGPT or Google's Bard, as they became widely available to the public.
Generative pre-trained transformers are AI models specifically trained to understand and generate human-like text and process vast amounts of textual data. With recent developments of LLMs came the phenomenon of "emergent abilities."
Emergent abilities are unintended capabilities of LLMs, which are not pretrained into the AI model. According to ChatGPT, "emergent abilities in generative AI models refer to the unexpected or novel capabilities that arise from the training and operation of these models. While generative AI models are initially trained to learn patterns and generate content based on existing data, they can sometimes exhibit behaviors or produce outputs that were not explicitly programmed or anticipated by their creators. These emergent abilities can be both beneficial and potentially concerning."
Emergent abilities may include creativity, deepfakes, style transfers, improvements on existing content and work, and language understanding. For instance, an LLM may start understanding languages other than those it was initially trained for. Such emergent abilities may surface during the training phase of the AI model and probably during the operational production phase, if the model is set for continuous improvement, such as learning from interactions with human beings and other IT systems.
While the development and use of AI has many legal and ethical aspects and implications, securing the privacy rights and freedoms of data subjects while creating, training and using GPTs is a key consideration. AI technologies are developing rapidly, and the current legal framework typically consists of intellectual property (copyright, know-how and other proprietary rights), data protection and information security-related regulations.
The EU General Data Protection Regulation sets the current applicable data protection requirements. The question may then arise: if emergent abilities are inherently vested in LLMs, does their emergence qualify as further use under the current provisions of applicable EU data protections laws?
While the GDPR does not define the term further use, related provisions are set out in Article 6(4) and used in EU data protection practice. Further use means "processing for a purpose other than that for which the personal data have been collected."
The GDPR also says the further use of personal data is allowed only if the original purpose and the further use purpose are compatible. Such compatibility must be assessed on a case-by-case basis, which could be difficult in practice in the case of LLMs, as even developers may be surprised by models' actual abilities. Also, to determine the compatibility of purposes, data controllers must consider any link between the different purposes, the context of personal data collection, the nature of personal data, the potential consequences of further use on data subjects and the application of appropriate technical controls.
In our view, a link may be easily established, as some emergent abilities may not require a processing purpose different from the original. Also, the context of personal data collection is similar, while the nature of personal data is preset by the controller, as it collects training and validation data. However, in real-life situations, this may not always be the case. Further, the assessment of potential consequences may be difficult due to the black box-like, opaque nature of AIs. The ultimate solution may still be obtaining consent from affected data subjects, but in practice this may seem cumbersome and largely ineffective.
The EU AI Act will potentially affect the development and use of GPTs in real-life scenarios. One of its main goals is to develop and create a technology-neutral regulatory framework for the development, use and provision of high-risk AI systems within the EU. Due to recent developments and publicity about GPTs, especially ChatGPT, the AI Act now also refers to generative AI models.
According to the European Parliament, "Generative foundation models, like GPT, would have to comply with additional transparency requirements, like disclosing that the content was generated by AI, designing the model to prevent it from generating illegal content and publishing summaries of copyrighted data used for training."
While the draft regulation specifically addresses generative AI technologies, it is silent about the emergent abilities of LLMs and the treatment of any potential risks that may relate to them.
However, the AI Act would require the implementation and operation of certain risk-management procedures and means in cases of high-risk AI systems. In the draft, high-risk AI systems are operated in the areas of biometric identification and categorization of natural persons; management and operation of critical infrastructure; education and vocational training; employment, workers management and access to self-employment; access to and enjoyment of essential private services and public services and benefits; law enforcement; migration, asylum and border control management; and administration of justice and democratic processes.
GPTs are not likely to qualify as high-risk AI systems under the proposed AI Act, but the obligations on generative AI mainly concern transparency, including disclosing that the content was generated by AI, designing the model to prevent it from generating illegal content and publishing summaries of copyrighted data used for training.
The provisions of the GDPR will likely remain applicable, and data controllers must assess whether any emergent ability within LLMs qualifies as further use and would violate the purpose-limitation principle on a case-by-case basis. If yes, such emergent ability may mean a data processing purpose is incompatible with the original purpose and, therefore, would require further compliance-related actions from the controller, e.g., obtaining consent from affected data subjects.
Conclusion
AI's emergent abilities are a double-edged sword, offering both incredible potential and substantial challenges. Embracing these abilities while addressing associated risks and uncertainties requires a multidisciplinary approach that combines technological innovation, ethical considerations and responsive regulation. It is essential to carefully navigate this evolving landscape to harness the benefits of AI's emergent abilities while safeguarding privacy, ethics and societal well-being.