What Is Exact Data Match (EDM)?

What Is Exact Data Match (EDM)?
What Is Exact Data Match (EDM)?

As the name suggests, Exact Data Match (EDM) compares two fields character by character in separate records. It is often referred to as deterministic linkage because it provides a definite outcome: whether the records match.

An EDM tool can only be employed when your dataset contains uniquely identifiable attributes. A unique attribute is a data characteristic that cannot be the same for two entities.

What Is EDM in Cybersecurity?

Exact Data Match is a tool used during the data discovery phase. It helps uncover business information and specific sensitive data in both organised and unorganised data repositories.

Simply put, EDM is a technique used for data classification and matching. It identifies instances of data loss involving sensitive records stored in a structured data format. It also employs data fingerprinting instead of pattern-matching techniques to protect sensitive data.

EDM in cybersecurity aims to identify and safeguard consumer sensitive information, such as MRN, bank account numbers, and social security numbers, by examining the data rather than relying on pattern-matching methods.

This approach allows EDM to detect sensitive data accurately while minimising false positives, making it an efficient tool for discovering sensitive data.

How Does Exact Data Match Work?

EDM analyses large amounts of structured data organised in rows and columns. To use EDM, a Data Prevention tool is installed on the customer's local server, which fingerprints these extensive data stores containing billions of cells.

A Data Loss Prevention (DLP) tool relies on the encrypted hash of the sensitive data you upload. The service indexes these encrypted hashes to create a dataset. This indexed hash data is used in the Security policy of the DLP tool to match and prevent the transmission of sensitive data.

Trained fingerprints are securely uploaded to the DLP using innovative fingerprinting techniques. This process allows for quickly deploying Data Protection rules using these large fingerprint datasets to safeguard the most sensitive data in its original form.

Referred to as Structured Fingerprints, these EDM Fingerprints can be used as criteria to match content in data classifications, effectively preventing data exfiltration when the fingerprints are an exact match.

With its precise matching capabilities, EDM not only relieves administrators from the manual effort of crafting complex regular expressions but also ensures the highest level of compliance adherence for enterprises.

Explaining the EDM Workflow

  1. Export user records from a database to a .csv or .tsv file.

One can use the pipe command (|) to send user records from the database to the EDMTrain tool (a fingerprinting tool) for processing.

  1. Generate the EDM-enhanced fingerprint file and create an index.

This step involves creating a unique fingerprint for each record in the user records file and generating an index to search and match fingerprints later efficiently.

  1. Define the criteria for content classification using the exact data-matching approach.

This means specifying the specific data elements or patterns that need to be matched exactly to classify the content correctly.

  1. Create a rule set that incorporates the EDM (Enhanced) classification criteria.

This rule set will define the actions to be taken when a match is found based on the content classification criteria. Apply this rule set to a Data Loss Prevention (DLP) policy, which governs the protection and handling of sensitive data.

Why Is EDM Important?

Suppose a company wants to monitor its employees' use of social media during work hours to ensure productivity. They deploy a monitoring system that scans network traffic for any activity related to social media sites. Now, this approach may generate numerous false positive alerts.

For instance, if an employee uses a messaging platform like Facebook Messenger for work-related communication or accessing a professional networking platform like LinkedIn, the monitoring system might flag it as a violation and trigger an unnecessary alert.

This can lead to frustration for both the employee and the security administrator who receives the alert.

The organisation can refine the monitoring process with Exact Data Matching (EDM). Instead of simply monitoring all social media activity, the system can focus on specific social media platforms that are explicitly prohibited or pose a higher security risk.

This way, only the designated platforms will trigger alerts, while legitimate use of other platforms will not be flagged.

EDM allows admins to prioritise investigating policy violations or potential security threats by reducing false positives. It streamlines the system by minimising unnecessary alerts and allows for a more efficient allocation of resources towards addressing genuine concerns.

Benefits of EDM

Inline Inspection and Enforcement

A Data Prevention Tool swiftly blocks data from leaving the organisation without negatively impacting user experience. EDM offers inline inspection for all network traffic, whether users are connected to the network or not.

This assessment improves the accuracy of identifying data loss incidents and significantly reduces false positive alerts. EDM securely handles the application and user traffic by adding native SSL inspection and boosting complete security and visibility.

Accurate and Detailed Data Classification

The DLP solution deploys advanced techniques, including predefined policies and machine learning algorithms, to precisely classify sensitive data.

This level of granular data classification empowers organisations to apply appropriate security measures and effectively prevent unauthorised access or leakage of sensitive information.

Compatible with Cloud Scalability

Users can fingerprint and match up to a billion data cells at any given time by leveraging the scalability of a cloud.

Deploying an EDM solution on-premises may give rise to performance limitations due to the resource-intensive nature of the technology. The cloud provides the capacity and resources to handle large-scale data matching efficiently.

Limitations of EDM

Inability to Handle Variations

EDM relies on exact matches between data fields, which means it may struggle to handle variations in data entry. For example, if a name is misspelt or abbreviated differently in different datasets, EDM may fail to identify the match.

Sensitivity to Data Quality

EDM's effectiveness depends on the quality and consistency of the data being matched. Inaccurate or incomplete data can lead to false matches or missed matches. Data cleaning and preprocessing are necessary to improve the accuracy of EDM results.

Lack of Contextual Understanding

EDM only considers exact matches without taking into account contextual information. It cannot comprehend the meaning or context behind the data, potentially leading to false positives or missed matches.

For example, EDM may match two individuals with the same name and similar ages, but they may be different people.

Limited scalability

As datasets' size and complexity increase, EDM's scalability becomes a concern. Matching large volumes of data can be time-consuming and resource-intensive, especially when dealing with complex data structures or multiple data sources.

Lack of Flexibility in Matching Criteria

EDM operates on predefined matching criteria, such as exact matches on specific data fields. It may not be able to adapt easily to different matching requirements or accommodate fuzzy matching approaches.

Susceptibility to Data Privacy Risks

The use of EDM involves sharing sensitive data across different systems or organisations, which can introduce privacy risks if adequate security measures are not in place.

How Can InstaSafe Help?

As seen above, there is a growing concern about the potential risks associated with data matching and privacy breaches. Care must be taken to ensure data protection, encryption, and compliance with relevant privacy regulations.

InstaSafe Zero Trust Application Access solution assures granular control and strict access restrictions for every user, eliminating any unauthorised access to sensitive data.

Our solution offers context-aware access controls, going beyond mere exact matches.

Visit our website or contact us now to learn more and schedule a demo.

Frequently Asked Questions (FAQs)

What languages does EDM support?

EDM can support multiple languages, primarily relying on exact matches between data fields rather than language-specific processing. However, the effectiveness may vary based on the quality and consistency of data in different languages.

What is EDM classification?

EDM classification refers to categorising data based on specific criteria or attributes. In the context of Exact Data Match (EDM), it involves matching and classifying data based on exact matches of personally identifiable information (PII) across different datasets.

What are IDM and EDM in DLP?

IDM (Identity and Access Management) and EDM (Exact Data Match) are both components of Data Loss Prevention (DLP) systems.

IDM focuses on managing user identities and controlling access to sensitive data. In contrast, EDM identifies and matches personally identifiable information (PII) across datasets to detect potential data breaches or unauthorised access.

Can you give an example of EDM?

An example of EDM could be matching customer data from two databases to identify duplicate entries.

For instance, if a company has two separate customer databases, EDM can compare and match customer names, addresses, and other relevant information to identify individuals who appear in both databases. This helps in consolidating and de-duplicating customer records.