RedMask is a command-line tool to mask sensitive data in a relational database. Currently, it supports Redshift and Postgres. This tool helps organizations to reduce the time for sharing data, building data warehouse solutions, ensuring security, and reducing compliance costs.
Why Data Masking?
Masking is an essential use case when huge data is managed. It is a critical step when dealing with personal or commercially sensitive and identifiable information. The data should be protected while being shared with other internal or external people or agencies for consumption. Here are a few examples, where masking would be necessary:
- FinTech/BFSI organization – Sharing customer, product, and transaction details.
- Healthcare industry – Sharing patient data, diagnosis, and health-related information.
- E-commerce organization – Sharing the details of the consumer and product database.
- Transportation and urban logistics companies – Sharing user and location data.
How RedMask Works?
Administrators can mask data using a variety of techniques. RedMask uses native database queries in order to mask data and manage permission; and thus, has a minimum impact on performance. RedMask supports both dynamic and static masking.
In static masking, a new table is created with the selected columns masked. This increases storage costs. This technique is suitable for data sets that do not change often.
In dynamic masking, RedMask creates a view that masks the desired columns. The view has the same name and columns as the underlying table but is in a different schema. When a query is executed, the data warehouse picks either the table or the view, depending on the search path/default schema.
RedMask creates a masked view for the data consumers with lesser privileges based on the settings. The consumers will see the masked data instead of real data. In addition, RedMask supports a dryrun mode, wherein it just generates a SQL file with required queries, while making no changes to the Database. This allows the Database Administrators to verify the underlying queries.
What Makes RedMask Different?
RedMask addresses the challenges faced while using other masking tools in the market.
- RedMask is a proxy-less application, therefore it requires minimal setup time.
- RedMask uses underlying database queries, therefore it has a negligible impact on performance and keeps the data within the database.
- It has an additional dry run mode for administrators who want to verify the script before it gets executed onto the database.
- Since RedMask doesn’t interact directly with the underlying data, it needs almost zero setup infrastructure and has a negligible communication overhead.
Building the RedMask Application
Step 1) Clone the RedMask repository using git clone https://github.com/hashedin/redmask.git
Step 2) Install Gradle for your respective operating system.
Step 3) Run the following commands
- gradle clean build
- gradle clean nativeImage
Step 4) Your application will be built in the following folder build/graal/redmask.
Masking Data on Redshift
Let us take the following customer table for masking data:
Now we will mask the Name and the Age field with the STRING_MASKING and RANDOM_INTEGER_WITHIN_RANGE masking rules respectively.
Step 1) Create a JSON File as config.json.
DB_SUPER_USER = <Administrator_username> DB_SUPER_USER_PASSWORD = <Administrator_user_password> DB_USER = <user_name> DB_USER_PASSWORD = <user_password>
Step 2) Run the RedMask CLI command
./redmask -f=/<path_to_josn_file>/config.json -r=false -m=static
- -f or –configFilePath is the complete file path of JSON containing masking configurations.
- -r or –dryRun when true, this will just generate SQL file with required queries. It will not make any changes to DB. It indicates the dry-run mode.
- -m or –mode whether you want static or dynamic masking.
With just that, you get your masked table:
To run it in the Dynamic mode we need to just change –mode =dynamic in the CLI command
./redmask -f=/<path_to_josn_file>/config.json -r=false -m=dynamic
Note that the masked view is created under the schema name after the username entered in the config file.
The user will only be able access to this view when he queries this particular table.
- Does it only support Redshift?
- Does this involve any additional infrastructure cost?
- Will the new tables and views that are created, be secure?
No, RedMask also supports Postgresql and Snowflake. The development of supporting other databases is in progress.
The RedMask application itself has a negligible setup and operating overhead. Additional storage would be required to store mask data under static masking mode.
Yes, the new tables and views are extremely secure. They allow access only to the needed user or roles that have been given at the time of masking. Additional permission can be assigned by the Database Administrator at their own discretion.