Dynamic Data Masking (MAGEN)

Data masking is the process by which sensitive or confidential data is replaced, possibly in a reversible manner, with data that is unintelligible to the recipient. It can be used in a variety of use cases, including:

  1. Static data masking (data at rest)
  2. DB response masking
  3. Application layer masking (via proxy)
  4. Log file masking
  5. Export of log for analysis on cloud


Micha Moffie, IBM Research - Haifa

The process of masking seems straightforward -- replace an item with ‘***’. This can be achieved with a simple regular expression. However, building a generic, flexible, and powerful masking engine is a good deal more complex. The example below shows a composite payload where the name and the id, both highlighted in red, are masked.

This example presents some of the difficulties involved in data masking: how do we find a name in text? Once found - how should we mask it (e.g., should this process be reversible)? And lastly, how do we allow the user to specify this information for different payloads?

The MAGEN Data Masking library and service were developed to address these challenges in a generic and flexible manner. MAGEN (“shield” in Hebrew) stands for MAsking Gateway for ENterprises. Magen in Hebrew means shield. MAGEN does the following:

  1. Supports a wide array of mechanisms to identify and select data elements within structured, unstructured, and composite documents.
  2. Provides a wide array of masking/unmasking operations (e.g., redact, tokenize, encrypt). MAGEN also supports format-preserving encryption and tokenization based on our own FP library (Metal).
  3. Enables the modification (rewrite) of structured, unstructured, and composite documents while maintaining their structure and format.
  4. Allows the user to specify which data elements to select and what operation to perform on each of those elements.
  5. Supports conditional processing for greater flexibility.