Technology and business research companies expect the artificial intelligence market will continue expanding significantly in the coming years. AI is not a single technology. The ICO defines AI as "an umbrella term for a range of technologies and approaches that often attempt to mimic human thought to solve complex tasks."
In our everyday lives, we may encounter AI in the form of machine learning algorithms in personal assistants, personalized advertisements, fraud detection services, facial recognition technologies and more. Machine learning usually produces either a prediction, such as "am I in my fertile period?" or a classification, like "this email is spam" or "that is a human's face on the picture." The fuel for ML is data because the algorithm must be "fed" vast amounts of properly cleansed data to reach the output's desired level of precision. ML may rely on supervised, semi-supervised, unsupervised or reinforced learning.
Essentially, supervised learning requires a labeled set of data to feed the ML so it knows how to produce the expected outcomes, such as making a prediction or classification at the desired precision level. Unsupervised learning requires raw, unlabelled data to find new correlations. Semi-supervised ML is between the two and reinforced learning comes from the ML's interaction with its own environment. All these learning methods require different compliance approaches regarding the ML's implementation.
The ML life cycle
The ML life cycle may start with the definition and design of the solution. While the ML may be operating in a cloud back-end, it is also important to design the solution's user interface and user experience upfront to manage data protection and ePrivacy compliance-related controls, such as potential consent management tools. In the design phase, the organization may identify the applicable legal bases for:
- Initial teaching.
- Validating.
- Operating in production.
- Further teaching if applicable.
- Re-calibrating if applicable.
- Recovering.
- Sun-setting the ML solution.
Organizations may operationalize privacy-by-design and privacy-by-default obligations by insuring the initial dataset is cleansed, accurate, free of biases in line with storage limitation and data minimization requirements, and that the organization is transparent regarding the use of such data. The controller must ensure compliance with these requirements at the outset of the ML design process and continue at the time of processing. Organizations may also implement sound data governance, data quality and bias control processes, and regularly review the operation of the ML and re-calibrate when necessary. Whereas sun-setting the ML may require the ML solution's storage, the organization may securely delete all related data, including the initial teaching dataset.
Controllers, processors and joint controllers
One of the most difficult compliance challenges in implementing an ML solution is defining the data protection status of ML operators and users, which requires different compliance approaches. Many ML service providers define themselves as data processors; however, they intend to collect data from their customers and use it to fine-tune their own services further. Considering the Court of Justice of the European Union's case law, this may constitute a joint controllership among the parties. Federated learning, when many customers benefit from a common ML taught by the data collected from each customer, may also constitute joint controllership between the service provider and its customers that rely on the common ML.
ML-related service providers may qualify as data processors if they only provide different, segregated MLs to each customer and do not rely on centralized processing of data. While teaching such segregated ML is technically possible, this would result in different levels of "maturity" and precision for each customer.
Lawful legal basis and purpose
ML solutions may secure the lawful legal bases and processing goals in each phase of the ML life cycle. Organizations using ML solutions must secure data subject rights – which may vary depending on the applicable legal basis – and compliance with data protection principles, especially with accountability and privacy-by-design requirements in each phase of the life cycle.
In the case of online services relying on ML technology, organizations may also be aware of the applicable ePrivacy requirements. The EU's ePrivacy Directive requires user consent to store or gain access to information for any purpose other than executing user requests. Accordingly, using data collected throughout the ML's production operational phase to teach or fine-tune the ML may require user consent. In most cases, it will be difficult to say that teaching or fine-tuning is strictly necessary for executing user requests when utilizing ML technology.
The initial set-up of the ML generally requires real-life production data rather than dummy test data. Organizations may rely on their legitimate interests to set up the ML and validate the ML's precision rate, while the legal basis to provide services shall be likely necessary for the performance of a contract. Using production data also means that organizations may secure the development and testing environments with the same level of controls as the ML's production environment. Practically, this may also require the full separation of the ML solution and the data sets within the organization's IT environment. Due to the nature of the technology and relying greatly on pre-made AI frameworks, the organization may also implement controls to secure ML development and operation, including controls like white box testing and source code reviews.
It is important to assess if the original processing goals and further teaching of the ML are compatible with each other relative to the initial dataset. Organizations may also decide to secure the legal basis for the initial dataset upfront, even if it will be necessary to reuse for re-calibration or disaster recovery. This may require the performance and documentation of compatibility tests to verify further use as set out by Article 6(4) of the EU General Data Protection Regulation.
Risk assessments
The use of ML may have a significant impact on the data subject's rights. Therefore it is paramount that organizations thoroughly assess the privacy risks before ML implementation. This can be the case regarding automated decision-making and profiling based on ML technology if the decision produces legal effects, significantly affects the data subject or feeds sensitive personal data to the ML.
In the design phase of an ML solution, organizations may elect or be required to conduct a data protection impact assessment to identify and manage data subjects' privacy risks, as well as an internal risk assessment to manage the organization's own compliance and information security risks.
Compliance is a must
Many build their business models using ML, and a number of software vendors offer "ready-made" ML infrastructures. Some are in the form of cloud services, but there are several open-source frameworks and pre-built solutions to start. The widespread use of technology is promising to better our everyday lives.
However, service providers building their products around ML and organizations using such services must also be aware of the data protection and possible ePrivacy-related compliance risks they need to tackle to achieve their business goals. Disregarding such compliance requirements in the early phases of designing and implementing an ML solution may result in increased operational and compliance costs; in any case, it is worth paying attention to them.
Photo by Kevin Ku on Unsplash