RedMask – Mask Sensitive Data in Redshift

Sanchay Gupta

16 Jun 2020

RedMask is a command-line tool to mask sensitive data in a relational database. Currently, it supports Redshift and Postgres. This tool helps organizations to reduce the time for sharing data, building data warehouse solutions, ensuring security, and reducing compliance costs.

 

Why Data Masking?

 

Masking is an essential use case when huge data is managed. It is a critical step when dealing with personal or commercially sensitive and identifiable information. The data should be protected while being shared with other internal or external people or agencies for consumption.  Here are a few examples, where masking would be necessary:

 

  1. FinTech/BFSI organization – Sharing customer, product, and transaction details. 
  2. Healthcare industry – Sharing patient data, diagnosis, and health-related information. 
  3. E-commerce organization – Sharing the details of the consumer and product database.
  4. Transportation and urban logistics companies – Sharing user and location data.

 

How RedMask Works?

 

Administrators can mask data using a variety of techniques. RedMask uses native database queries in order to mask data and manage permission; and thus, has a minimum impact on performance. RedMask supports both dynamic and static masking.

 

Static Masking: 

In static masking, a new table is created with the selected columns masked. This increases storage costs. This technique is suitable for data sets that do not change often.

 

Dynamic Masking:

In dynamic masking, RedMask creates a view that masks the desired columns. The view has the same name and columns as the underlying table but is in a different schema. When a query is executed, the data warehouse picks either the table or the view, depending on the search path/default schema.

RedMask creates a masked view for the data consumers with lesser privileges based on the settings. The consumers will see the masked data instead of real data. In addition, RedMask supports a dryrun mode,  wherein it just generates a SQL file with required queries, while making no changes to the Database. This allows the Database Administrators to verify the underlying queries.

 

What Makes RedMask Different?

 

RedMask addresses the challenges faced while using other masking tools in the market.

 

 

Building the RedMask Application

 

Step 1) Clone the RedMask repository using git clone https://github.com/hashedin/redmask.git

 

Step 2) Install Gradle for your respective operating system.

 

Step 3) Run the following commands

 

Step 4) Your application will be built in the following folder build/graal/redmask.

 

Masking Data on Redshift

 

Let us take the following customer table for masking data:

 

 

 

Now we will mask the Name and the Age field with the STRING_MASKING and RANDOM_INTEGER_WITHIN_RANGE masking rules respectively.

 

Step 1) Create a JSON File as config.json.

 

 

 


DB_SUPER_USER = <Administrator_username>  

DB_SUPER_USER_PASSWORD = <Administrator_user_password>  

DB_USER = <user_name>  

DB_USER_PASSWORD = <user_password>

 

Step 2) Run the RedMask CLI command

 


./redmask -f=/<path_to_josn_file>/config.json -r=false -m=static

 

where:

 

With just that, you get your masked table:

 

 

To run it in the Dynamic mode we need to just change –mode =dynamic in the CLI command

 

./redmask -f=/<path_to_josn_file>/config.json -r=false -m=dynamic

 

 

Note that the masked view is created under the schema name after the username entered in the config file. 

The user will only be able access to this view when he queries this particular table.

 

FAQs


Have a question?

Need Technology advice?

Connect

+1 669 253 9011

contact@hashedin.com

linkedIn youtube