TOTAL: {[ getCartTotalCost() | currencyFilter ]} Update cart for total shopping_basket Checkout

Privacy Perspectives | How machine learning can help small businesses deal with data privacy compliance Related reading: Machine learning compliance considerations




Data privacy is one of the leading concerns for businesses to ensure confidentiality and preserve trust. Over the last few decades, the digital footprints of our society have shown exceptional growth. But, this digital revolution is striking hard over privacy concerns of individuals. 

According to Pew Research, 81% of Americans report the potential risk of data collected by companies overshadowing the benefits they receive from those businesses. 

Challenges in executing privacy compliance for small companies

Data privacy is not a matter only crucial to big companies. In today's globalized economy, any size company would like to be privacy compliant. However, it's not easy for small or mid-sized companies to complete all the compliance within budget and expectations. Their challenges in the compliance journey are contrary to general perception.

Their first struggle is understanding the applicable regulations for their business. In this borderless economy, companies usually aim to operate multi-nationally. But every state and country has different laws and regulations regarding data privacy. This lack of awareness can shut down businesses, or hefty fines might eat away their capital. Knowledge of these laws is arduous, especially for small and mid-sized businesses. 

Every small business is unique and understanding the complex privacy regulations and their applicability to their business is even more difficult. Typically generic google searches or attending a conference or forum do not address specific questions. Hiring or retaining a privacy consultant or partner is typically out of budget for most businesses. 

The second challenge is finding the right balance between improving customer experience and collecting too much information. Typically small businesses get recognized for personalizing their services for their clients. However, at the same time, they are concerned they might be collecting personally identifiable information or sensitive personal information, which can become a hurdle for them later. 

Role of automation in simplifying the compliance

Using automation tools/solutions may help address the above challenges. However, selecting the right tool for their business is extremely difficult. Exercises like privacy impact assessment or completing a questionnaire becomes extremely difficult for them using these tools.

While we aim to simplify questions or recommend training to spread awareness in the organization, it's easier said than done in most scenarios. 

How machine learning has helped industries in the automation

Machine learning is a subset of the artificial intelligence domain and an algorithmic framework that learns from past data using statistical models. The performance of these ML models significantly depends on the training data and selection of the right model. These models have helped many industries automate some of the repetitive processes or places where a large amount of data is used. 

Case study: Mid-size company PIA experience 

On the surface, it appears machine learning can also help industries in data privacy compliance. Based on this premise, let's review a case study for one mid-size organization to complete a privacy impact assessment in one month. The quick turnaround was because the customer received a notice from their respective regulator to complete the assessment within one month or face a huge penalty. At the same time, their budget is also limited to executing this activity. So they have no option but to try something new that automates repetitive activities.

Typically in these exercises, the first roadblock comes in meeting with user departments, where department heads do not allow enough time to complete the questionnaire. Also, getting the correct questionnaire to the right person can be challenging, and in some cases, the person may not be aware they were responsible for completing it. This is not surprising, and it's normal in most organizations. 

The second big challenge is to explain the NIST/ISO27001 framework questions and find the right person who can answer these questions. Herewe can leverage machine learning — natural language processing — models to identify keywords and their mapping with the possible user department. It can help companies reduce a significant amount of time on repetitive work. 

The next big challenge is understanding and building the data flow diagram. For this part, companies can use an in-house tool of pre-built ML libraries that make it easier for everyone to quickly fill in their information and visualize the data flow diagram.

Another challenge is teaching people about personal and non-personal data. In this aspect, machine learning can be of great help. We can build a simple ML model to separate personal and non-personal data, so it reduces the need for repetitive training on personal data. 

This model can help in assessment in two ways:

  1. It reduces the repetitive work of identifying PII and non-PII information.
  2. It also gives the flexibility to encrypt PII information at discretion — in our case, we used SHA 256 encryption — the industry gold standard.

Ultimately, these tools can provide flexibility in completing the assessment steps. While there is still a long way to go, machine learning can help optimize resources and reduce costs for organizations in their privacy impact assessment. 

Credits: 1

Submit for CPEs


If you want to comment on this post, you need to login.

  • comment Alla Nabatova • Nov 25, 2022
    Thank you for the article. It makes total sense to leverage ML/AI for privacy assessments due to the big amount of information that needs to be analyzed. I have some questions, I would appreciate if you could help with them: 
    1. How exactly this use case was resolved: "we can leverage machine learning — natural language processing — models to identify keywords and their mapping with the possible user department."? I assume that the questionnaire content was compared to the organization's org structure, and departments/people relevant to the specific questions were identified. Is it correct? 
    2. What security and organizational measures can be incorporated into the ML solution itself? As we have to train the ML model, we need to care about the safety of the data set that is used for training. 
    Thank you!
  • comment Pramod Misra • Dec 11, 2022
    Alla Nabatova, thanks for your comments. Yes, you are right. 1. The first step is to capture the org and dept structure & people. So that is correct. 2. The ML solution uses the local instance of the organisation ( on-premise or cloud) , ensuring no additional exposure. We have trained our models with the already existing questionnaires and datasets; hence, it comes as a pre-trained model. We do not store anything from the company dataset to ensure data security at every stage. I hope this helps. Feel free to DM me if you need more information