In today's digital age, consumers are increasingly aware of their privacy rights, and when they request their data to be deleted, they often believe it is gone for good. Reality, however, is more complex.
Even when organizations comply with data deletion requests, a persistent and overlooked issue looms in the background: shadow data.
This hidden trove of data not only undermines compliance with laws like the California Consumer Privacy Act and the EU General Data Protection Regulation but also fuels a thriving secondary market in which data brokers monetize this forgotten data for advertising, analytics and more.
What is shadow data?
Shadow data refers to unmanaged or forgotten copies of personal information that reside in backups, archives or third-party systems. These forgotten fragments of data create a range of privacy risks, from regulatory noncompliance to significant security vulnerabilities, and also contribute to a thriving secondary market for personal data.
Shadow data emerges from the complexities of modern data management and, even when organizations strive for transparency and compliance, technical and operational realities often get in the way. For example, backups made for disaster recovery purposes might inadvertently retain data long after a deletion request has been processed. Similarly, legacy systems with outdated architectures can make it difficult to trace and fully erase data.
This problem is compounded when organizations work with third-party vendors or partners to process data. Once personal information is shared outside the organization, ensuring its proper deletion becomes significantly more challenging. Without strict oversight, these third parties may inadvertently or intentionally retain data in systems beyond the reach of the original controller.
Shadow data is not limited to technical oversights as human error plays a significant role as well. Employees might save files to unauthorized cloud drives or personal devices for convenience, creating unmonitored duplicates of sensitive information. These unmanaged copies can persist for years, unnoticed and unaddressed, until they become a problem.
The consequences of shadow data
The risks associated with shadow data are not merely theoretical; they have real-world implications.
In one notable case, an automotive company faced a significant data breach affecting thousands of customers. The breach occurred because customer data, originally shared with a third-party vendor for testing purposes, was inadvertently stored in a public cloud environment without proper security measures. Months passed before the error was discovered, during which sensitive information remained exposed to potential exploitation.
An IBM report found 35% of data breaches that occurred in 2024 involved data stored in unmanaged data sources. Notably, 25% of breaches involving shadow data occurred exclusively on premises, underscoring significant unmanaged risks, such as data governance gaps, data privacy concerns and regulatory impacts.
Furthermore, breaches involving shadow data took 26.2% longer to detect and 20.2% longer to contain, with an average duration of 291 days. These delays contributed to higher breach-related costs, averaging USD5.27 million when shadow data was impacted. However, these figures represent just a portion of the broader repercussions, including spillover effects on partners, potential contractual issues and extended costs from lawsuits, which can persist for years after the initial breach.
Beyond breaches, shadow data feeds into a broader ecosystem of secondary data markets. Data brokers often collect and aggregate information from residual sources, building detailed profiles of individuals for sale to advertisers, political campaigns and other entities. Even when users believe their data has been deleted, fragments may persist in third-party systems or forgotten backups, ultimately making their way into this marketplace.
The regulatory and compliance challenge
Recent regulatory actions have highlighted the risks associated with shadow data.
In the U.S., the Federal Trade Commission took enforcement action against data brokers for mishandling sensitive information, including location data and health records. In one case, brokers were found to have improperly tracked individuals visiting sensitive locations, such as medical clinics or religious institutions, using residual data. The Consumer Financial Protection Bureau also proposed rules to limit the sale of personal data, aiming to curb the misuse of information in secondary markets.
Despite these developments, existing privacy laws often struggle to address the complexities of shadow data. While the GDPR and CCPA mandate data deletion, they provide limited guidance on managing residual data in backups or third-party systems. Technical limitations, such as the need to preserve backups for operational continuity, further complicate compliance efforts.
Tackling shadow data: A call to action
Addressing shadow data requires a comprehensive, multifaceted approach to ensure privacy risks are mitigated, and compliance requirements are met.
Conduct comprehensive data mapping. Organizations must create a detailed inventory of all the systems where data resides. This includes active databases, backups, archives, legacy systems and third-party environments. A thorough understanding of data flows — both internal and external — is essential to identify where shadow data might exist.
Strengthen vendor accountability. Many shadow data issues arise from third-party relationships. To mitigate these risks, organizations should establish explicit contractual obligations for data handling and deletion, regularly audit vendor practices, hold vendors to the same compliance standards they apply internally, and terminate partnerships with noncompliant third parties.
Implement advanced deletion protocols. Automated tools and software solutions can help locate and eliminate duplicate or residual data stored across systems. Backup systems should be configured to ensure they align with data retention policies and regulatory requirements.
Foster a culture of data stewardship. Shadow data often results from human error, such as employees saving files to unauthorized locations. Training employees in proper data management practices and reinforcing the importance of compliance can significantly reduce these risks.
Advocate for stronger regulations. Privacy professionals and organizations can play an active role in shaping future legislation. Clearer guidelines on handling backups and residual data, coupled with stronger enforcement mechanisms, are essential to address the shadow data challenge comprehensively.
Conclusion
Shadow data represents a significant blind spot in the quest for robust privacy protections. It undermines consumer trust, exposes organizations to regulatory penalties and fuels a secondary data market that thrives on the misuse of personal information.
As regulators increasingly focus on data brokers and residual data practices, addressing shadow data is no longer optional; it is an essential component of modern privacy compliance.
For privacy pros, the challenge lies in balancing operational realities with the need for transparency and accountability. By adopting comprehensive data management practices, strengthening vendor oversight and advocating for clearer regulatory guidelines, organizations can begin to tackle the shadow data issue.
In doing so, they not only protect their own interests but also help build a more trustworthy and privacy-resilient digital ecosystem.
Jennifer Dickey, CIPP/E, CIPP/US, CIPM, CIPT, FIP,is an associate with Mullen Coughlin's Advisory Compliance practice group.