Let’s talk about Data anonymization!
August 2, 2024
Appaye Olivier
Appaye Olivier
Database Reliability Engineer
Share Post

Music data is at the heart of Believe’s business, with millions of soundtracks and albums released and consumed on streaming services daily Data is collected, stored, and used from various sources, including multiple streaming music platforms (YouTube, Spotify, Apple), which is why data privacy is increasingly important for the music industry. In an industry reliant on digital products and information merchandising, cybersecurity and privacy are paramount. So, according to the European Union’s General Data Protection Regulation (GDPR), one of the challenges for our industry is to protect all artist’s sensitive data

Anonymize data, but how?

Data anonymization is a process of protecting sensitive personal information by deleting, masking, or encrypting data points that can identify individuals. However, this process must be secure due to the risk of retrieving original data through reverse engineering, especially when anonymized information is combined with public data sources.

 

Data anonymization principles

Like any other industry, the music industry uses critical data that can compromise artists’ privacy, like financial data and detailed music activity on streaming platforms. This information must be separated from the artists’ names/codes/identifiers. What are the advantages of the anonymization process? Improving data security in non-production environments and ensuring GDPR compliance for the company are the main aspects of such a process, whereas using anonymization produces less accurate data for analysis and testing. Unfortunately, it cannot prevent possible re-identification risks. First, sensitive data must be identified and classified according to the UCSF’s Data Classification Standard (Data Protection Levels VS Availability Levels). For example, we can identify P2 data (internal data) like artists’ full names, emails, addresses, and personal account information …

Then, we can use multiple methods to anonymize data:

  • Data masking: this method consists of hiding data value, like using character substitution/randomization.
  • Data swapping: this method uses permutations to shuffle data in different orders as the original.
  • Random noise: that’s generally used in Data Science to alter random data by injecting random values

Other methods exist to anonymize data, but we have only to consider the most suitable to our context and our data typology.

So, what’s next?

The most important thing in anonymization is understanding the data usage and getting a good classification before developing some big source code. Focus on sensitive data, prioritize it, and establish a robust data policy in compliance with legal requirements (GDPR).

So, by addressing these anonymization principles, we consolidate processes and data security, which means more trust for Believe artists in our digital services.

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.