As privacy concerns mount — both cyber threats and legal requirements — a clear, formal, standard model of data components and their history has become necessary. Here, we introduce the concept of the data bill of materials or personal data bill of materials, a comprehensive inventory of personal data used in software systems.

The DBoM records the ownership, sharing history, storage and collection purpose of a unit of data. The purpose of a DBoM is to identify personal data as an asset and an essential component of the software and system inventory, just as integral as programs, servers and other components. The purpose of this is to maintain the integrity of personal or sensitive data; ensure the confidentiality of data throughout the life cycle; and provide transparency about the collecting, usage, storing, sharing and destruction of personal data. This will improve data security, privacy and user confidence in data systems, and will make compliance with international privacy laws simpler and more effective.

Bill of materials for software systems

Used frequently in supply chain organization, a bill of materials is a “comprehensive inventory of the raw materials, assemblies, subassemblies, parts and components … needed to manufacture a product.” Essentially, the BoM consists of all the goods and resources involved in assembling a final product. Having a detailed BoM can help companies estimate material costs, plan purchases, control their inventory and maintain accurate records.

After the SolarWinds supply chain attack, where hackers inserted a backdoor into commercial software to access users' computers, U.S. President Joe Biden issued an executive order to improve cybersecurity in the United States. Part of the cybersecurity improvements included the National Telecommunications and Information Administration publishing guidance on the software bill of materials, a record of individual components in a piece of software. In a world of software development with various open-source libraries and a highly interdependent ecosystem, SBoMs can help with software transparency, integrity and identity, allowing users to better inventory the components of their software to find and resolve vulnerabilities. 

Introducing the data bill of materials

While the SBoM is a good step in the right direction to bring transparency and integrity, it does not provide these benefits to every part of a system. The SBoM covers software components, but the data assets stored and processed within the system are not covered. Data, which can be in various forms like structured (ex., databases) or unstructured (ex., file shares), are an essential part of the software ecosystem. The data in the system is in fact a critical asset that needs to be protected. While all kinds of data exist, personal data has special importance for security and privacy. There is a need for a comprehensive inventory of personal data collected, used, processed and destroyed in the system life cycle, which we call the data bill of materials or personal data bill of materials. 

The essential components of the DBoM are proposed below. This list is a detailed starting point for what should be recorded in the DBoM, but can be developed and made more comprehensive by adding more factors in the future.

Where-Is-Personal-DBoM-Chart_p2.png

Overall, a DBoM should answer the following questions:

  • Data use
    • What personal data, sensitive data, and nonpersonal data are you using?
    • When was the data collected?
    • How do you use the collected data?
    • Who is responsible for the data set?
    • Who is responsible for administering the data?
    • What other first and third parties have access to the data?
    • What applications use the data?
    • What mechanisms are used to protect the data?
  • Data collection
    • Who are you collecting data from?
    • What categories of personal and nonpersonal data are you collecting?
    • How do you collect data?
  • Data location
    • Do your data use, collecting, and processing comply with applicable laws?
    • Where is your data stored?
    • Who are you sharing data with?

Benefits of a DBoM

Like how an SBoM simplifies the identification of the digital components of a piece of software into a machine-readable format, a DBoM will help stakeholders and data collectors find sensitive data as well as vital information about that data — such as when, why and how it was collected. Having such a record built into the data collection process will increase the transparency of the data collection and cataloging process. This helps stakeholders more effectively take inventory of the data they possess, which can otherwise be difficult considering the vast quantities of data companies might acquire. 

Compliance with privacy laws

The DBoM, as a standardized record of data collection and processing history, will simplify compliance with privacy laws and regulations. As one example, Article 30 of the EU General Data Protection Regulation requires data controllers to maintain a record of data processing activities for data under their responsibility, which could be satisfied by a rigorous DBoM. A detailed record of how the data was collected and processed will also make it easier to find data to be deleted, in order to comply with “right to be forgotten” laws like GDPR Article 17. Gaps or deficiencies in a DBoM could also be used to determine if data was unlawfully acquired or processed. As more countries create comprehensive data laws and regulations, simplifying compliance will only become more important, and the DBoM will reduce the time and labor costs of compliance.

Consumer confidence and trust

Even where a record of data collecting and processing is not required by law, maintaining such a record is a good way for data vendors and collectors to inform data subjects about how their personal information is being used. Meticulous cataloging and ease of requesting one’s data from a collector would increase data subjects’ confidence in the system and trust in companies responsible for their data. It is easier to identify who has stewardship over a given piece of data if there is a detailed, trustworthy log of personal data being collected and used by systems. Liability in the event of a data breach will be simpler to determine with a DBoM, since the record of data collection and transfers will show who had stewardship of the data at the time of the breach.

Trust between data processors

A standardized DBoM would also be useful for building trust and confidence among data processors, since conforming to an industry standard is part of due diligence. In the words of Scania Open Source Officer Jonas Öberg, “If a supplier conforms to (Scania’s preferred SBoM) specification, we feel confident that they have a professional management program for Open Source,” and the same principle applies to the DBoM. The DBoM will also build trust between data managers by ensuring the data itself is legitimate — the DBoM’s record can be used to determine if the data was obtained legally and how it has been processed or modified.

Use case: HIPAA

The health care industry has significant privacy needs. Health care providers and insurance companies collect a large amount of very sensitive data on patients and must comply with very strict laws like the Health Insurance Portability and Accountability Act to safeguard their privacy. A DBoM would streamline data communication between health care entities and simplify compliance with legal requirements.

Under HIPAA, a patient has the right to request a copy of their medical record or that their medical record be corrected. Without a DBoM, this may be difficult to accomplish. If a patient requests a copy of their record from their insurance provider, the insurance provider will have to provide a full account of all the patient’s data. Patient data can be spread across a wide variety of storage locations, and the insurance provider is required by law to find all of it — with a detailed DBoM, the insurance provider has a complete record of where that data has been stored and transferred. 

The DBoM also aids compliance with HIPAA’s Privacy Rule by tracking when, how, why and with whom data has been shared. The DBoM’s concrete record of sharing would reveal if the data has been shared illegally and with whom. Inversely, it can also absolve the data holder of liability by proving that it did not release the HIPAA-protected data to a forbidden party. 

Use case: Enabling data subject access requests

A data subject access request is a request submitted by a user to gain access to their personal data controlled by a data processor. Many data privacy laws, such as the GDPR and the California Consumer Privacy Act, give data subjects the right to request a copy of their data. Implementing a DBoM would simplify and improve the DSAR process for both processor and subject.

Different data protection laws have different requirements for the right to deletion. For example, the CCPA only grants a right to deletion for data the processor obtained directly from the consumer, while the GDPR also grants the right to data the processor obtained from third-party vendors. A detailed DBoM would make it easier for the data processor to track the history of each unit of data, to confirm what it received from the consumer and what it received from other sources. This allows data processors to ensure they are providing what is required by law.

Handling a DSAR differs depending on what law applies to the processor, but the response to the consumer always has certain requirements that can be more easily fulfilled with a DBoM. The response to a DSAR usually requires a record of all the requester’s personal information, the duration the data was stored for, how the data was obtained, and the identities of third parties that the processor shared the data with. Each of these is a proposed component of the DBoM, meaning the required information will be easier to find. 

Fulfilling a DSAR has a very high compliance cost. DSARs generally had to be fulfilled with a manual search through a processor’s data system — considering the sheer amount of data that processors accumulate, as well as records stored across widely dispersed systems, DSAR fulfillment can take a significant amount of time and labor. AI-driven data management technology has lowered these barriers to compliance, but still, they need pointers to find data sources where personal information exists. With a DBoM acting as a standardized record for the location and relevant characteristics of each unit of data, even manual fulfillment of DSARs becomes easier — less time spent on searching means faster and cost effective DSAR fulfillment.

Conclusion

Data is a vital asset to all organizations with a digital presence, with as much value as the software that processes it and the physical assets that store it. While software and physical assets have their own bills of materials that record where each system component came from and what they are used for, there is no standard industry practice to do the same for personal data. Instituting the DBoM as a standard practice will dramatically improve responsible use of personal data within software ecosystems and transparency for consumers and stakeholders with regards to how personal data is used. The record of transfers, storage locations and uses in a DBoM will allow customers to more easily see how and why their data was processed and allows data processors to share this information with consumers and fellow processors much more quickly and efficiently.

The SBoM is becoming a standard practice in the tech industry, recognized and reinforced with government guidelines and recommendations. It is improving security, transparency and integrity by making users more aware of the individual component of the software they use and thus better able to respond to vulnerabilities. Implementing the DBoM would result in similar improvements in the collection, usage, storage, sharing and destruction of personal data. As organizations better understand their data as an asset and data breaches become a more pressing concern, we predict that the DBoM will become a standard industry practice.