Who would have predicted a pandemic? And yet, it seems that somehow, we have learned to live with COVID-19. But have we also learned or reinforced "hidden" data-processing techniques? 

Always relevant, but in the last year more than ever, is the discussion on data and privacy protection. EU data protection authorities agree personal data protection must be ensured even in these exceptional times, especially when it comes to the generally prohibited processing of health data (Article 9 of the EU General Data Protection Regulation) when data is essential for combating the virus. In the absence of an appropriate legal basis for data processing and the absence of knowledge and awareness of what constitutes data processing, inferences might become very influential in shaping the private sector's business activities during and after COVID-19. 

Inferences relate to an identified or identifiable natural person created through deduction or reasoning rather than mere observation or collection from a data subject. Inferences are not personal data voluntarily provided by data subjects but created by data controllers or third parties from the (voluntary) provided data. The concept is well-known in the age of artificial intelligencemachine learning, the Internet of Things and similar technologies, but inferences are equally important in processing activities carried out without the use of sophisticated technologies. 

Inferences present a high risk for privacy. Their dangerous aspect is low verifiability, and they, even if inaccurate, might play a factor in important decisions and expose individuals to threats to their fundamental rights and freedoms. A significant loophole of the GDPR that does not properly define, regulate or refer to them, inferences have become tremendously dangerous in COVID-19-influenced excessive health data processing. Although it should be indisputable that an inference, if fulfilling necessary conditions as explained below, is personal data defined under the GDPR, it is inappropriate that not even a single article in the GDPR is dedicated to inferences specifically. 

An example of inferences developed without using sophisticated technology tools and "inspired" by COVID-19 is that someone could be thought to have the virus based on their origin of travel. This is a subjective assessment (a type of inference), as it involves inferring a non-observed characteristic (i.e., health data) of the subject from data already held (i.e., the origin of travel). A data controller could then use the origin data to "indicate" the likelihood of COVID-19 infection. Although health data is not easily spotted in this probability assessment as, in fact, a data controller processes origin of travel, a data controller effectively makes assumptions, thus draws its own conclusions of a data subject's health status, and if such conclusion is used to make decisions about the data subject, then this conclusion or even a suspicion of illness is a health data inference. 

The thought that inference equals data under the GDPR is discussed in the U.K.'s Information Commissioner's Office's guidelines on sensitive data processing but, overall, has not been sufficiently discussed, regulated and supervised in the EU. As exemplified above, when conclusions are created about someone's health, they should be treated as health data regardless of their reliability and accuracy. Not only would this type of processing be unlawful under the GDPR, but the created inferences might also be used for making decisions on data subjects. Several risks are associated with such inferences. 

Primarily, inferences might be completely inaccurate and still significantly impact data subjects and result in unjustified different treatment thereof. Inferences drawn about data subjects might determine how they are being viewed and evaluated. For instance, inferences of virus positivity or negativity could be used to make business decisions: Would, e.g., the hotel guests (inaccurately) marked as COVID-19 positive based on their origin of travel be separated from other guests? Would they be allowed to enter the hotel restaurant only at specific times or forbidden to enter the gym? Adjusting daily business to the pandemic, and in general, is reasonable, but making these decisions based on the "inferred" health status infringes data subjects' right to be reasonably assessed. Moreover, such processing would diminish the key GDPR principles, while, by being "hidden," would be hardly traceable for authorities. 

Similar parallels can be drawn in the employment context, where an employer makes assumptions on their employees' health status. For instance, an employer suspects an employee who traveled outside the country is ill and decides to have them work from home and forbids them to enter the office premises. Keeping in mind the inference does not have to be accurate to be personal data, the suspicion of illness is health data irrespective if an employee is indeed ill. Thus, such processing should comply with the GDPR rules.

Health data is not the only sensitive data category under threat of inferences that can arise from many other data-processing activities. For example, during the employee-recruiting process, AI or human recruiters can make assumptions on the applicants' religion, ethnicity, etcetera, based on applicant names and group them upon these criteria. 

If subsequent decisions are made based on these assumptions (e.g., recruiting decision), this might present the processing of sensitive personal data regardless of if the assumption was correct. Another example of how the inference of someone's religion can be used to make decisions in the employment context is that human resources could, based on the employee's name, make a note to wish a happy specific religious holiday to the employee. Apart from the "basic" legal basis from Article 6 of the GDPR, those inferences must have a legal basis from Article 9 of the GDPR and comply with all other data protection principles.  

The main challenge before regulators and where their focus should be is how to stop data controllers from making inaccurate assumptions of data subjects, starting from their health during and after COVID-19, regardless of the data category in question. 

Photo by Tom Claes on Unsplash