25 Aug. 2023

A view from DC: Can there be accountability for web scrapers?

Twelve data protection authorities, members of the Global Privacy Assembly International Enforcement Cooperation Working Group, released a joint statement 24 Aug. on data scraping and the protection of privacy. The statement starts by describing the privacy harms from widespread scraping, made more acute by the rise of generative artificial intelligence and other unexpected secondary uses.

Syrenis- A privacy professional's AI checklist: 10 points to master compliance ahead of the curve

According to the DPAs, scraping reduces individuals' control over their personal data. It erodes trust in the digital economy. And DPAs claim there's a rise in these specific privacy threats:

Targeted cyberattacks.
Identity fraud.
Monitoring, profiling and surveilling individuals.
Unauthorized political or intelligence gathering purposes.
Unwanted direct marketing or spam.

The DPAs are clear that their target audience for this joint statement and enforcement warning is not those who scrape websites for irreputable purposes, but rather the websites themselves, if they do not do enough to stop bad actors. This omission of an explicit enforcement warning against those who engage in unauthorized scraping showcases the limits of data protection law as a direct solution to this behavior.

To be fair, the DPAs do briefly mention the responsibility that data scrapers have to do the right thing: "In most jurisdictions, personal information that is 'publicly available,' 'publicly accessible' or 'of a public nature' on the internet, is subject to data protection and privacy laws. Individuals and companies that scrape such personal information are therefore responsible for ensuring that they comply with these and other applicable laws."

Left unsaid is the fact that, though publicly available personal information is subject to data protection restrictions in some jurisdictions, it is almost always subject to significantly fewer restrictions than personal data that has not been made publicly available. Exceptions and exclusions for public data make it difficult to bring enforcement actions over those who make later use of this data, though the entities who make it publicly accessible explicitly carry some of the responsibility.

This may explain why the rest of the letter turns to scrutiny over social media companies, entreating them to make use of enhanced technical and operational tools to combat web scraping. The examples given — rate limiting, identifying patterns of bot activity, blocking malicious IP addresses, implementing CAPTCHAs, etc. — have already become best practices among dominant social media platforms. In general, it is in these company's own interest to invest in scraping mitigation. But despite massive investment, scraping has continued, and legal tools have remained unavailable.

I have written previously about the limits that the public-private distinction in privacy law places on solutions for scraping in the U.S. legal context. After millions of dollars of litigation — spent by the very same social media companies who received the joint letter — the paths to legal remedies against scrapers have only narrowed. Companies have turned to self-help, by working to build a standardized game plan for combatting unauthorized scraping. Others, like professor Tim Edgar, describe how legislative reforms to the Computer Fraud and Abuse Act, if properly tailored, could provide direct legal remedies for the most malicious kinds of scraping.

Still, others are focusing on building a more standardized contractual environment between commercial scrapers and the sites they scrape, an effort that dovetails with the discussion about licensing for generative AI.

Professor Lee Tiedrich has been at the vanguard of building this type of solution, producing a report and convening multistakeholder discussions supported by the Global Partnership on AI. In the absence of legislative and enforcement tools, aligning incentives to get both parties into a clear contractual arrangement may be the most expedient option to protect personal data — and other categories of data.

Privacy regulators are showing some results in deploying their existing enforcement tools against allegedly harmful scrapers. Australia's version of the joint enforcement letter highlights the joint effort of the Office of the Australian Information Commissioner and the U.K. Information Commissioner's Office to rein in Clearview AI, resulting in an Australian determination letter and a British fine and deletion order. Later, the Italian DPA's attempt to block OpenAI from operating in the country resulted in a negotiated agreement for the company to enhance its privacy controls.

But these first steps show only glimmers of promise for major enforcement actions over those who make unexpected use of scraped data. Data protection laws, as currently written, are not tools honed for this purpose.

As privacy professionals, we must continue to look to the blank parts of the regulatory canvas to find solutions that will protect user data. This includes continuing to invest heavily in defensive technical and legal controls; but it also means reminding those who ingest data that privacy interests are still attached to public data. One day, regulators will find ways to insist on the documented provenance of personal data.

Here's what else I'm thinking about:

Want a full roundup of the current scrutiny over data brokers? Jessica Rich has you covered in Kelley Drye's Ad Law Access blog, covering regulatory and legislative proposals, including the closely watched California S.B. 362 and the CFPB's recently announced scrutiny of consumer reports, which I wrote about last week.
Biometric payment systems are spreading. Bloomberg Law explores the promises and perils of such technologies, as some U.S. states consider allowing them for age verification of liquor purchases.
Fedscoop reports that companies are confused by the Biden administration's approach to AI. Is it a rights-based approach or a risk-based approach?

Please send feedback, updates and negative space to cobun@iapp.org.

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

A view from DC: Can there be accountability for web scrapers?

Related stories