Entity Resolution with Graph Database

Resolve Data Ambiguity with TigerGraph

Data Ambiguity Is an Obstacle to Revenue Growth

The proliferation of various enterprise systems and tools has created multiple versions of the customer data. Every client-facing and back-office systems including CMS, CRM, ERP, and marketing automation system contains a slice of the customer data. This includes transaction data such as orders, policies, payments, claims, and also the channel interaction data such as customer service calls, website and physical store visits.

Many of these repositories also hold master data such as name, address and phone number. There are a lot of duplicates and inconsistencies in all of this data leading to over 41% B2B marketers citing it as their biggest challenge.

It’s better and cheaper to find and fix the problems earlier in the process as illustrated by the 1-10- 100 rule: it takes $1 to verify a record as it is entered, $10 to cleanse it, and $100 if nothing is done. The impact of poor data is evident in the outcome: marketing leaders estimate that 26% of their budget is spent on ineffective channels and tactics due to lack of clean, consistent and connected customer data.

Legacy Approaches Are Badly Suited to Resolving Data Ambiguity

Traditional master data management systems are built upon relational databases, which store information such as account, contact, lead, campaign, and opportunity in separate tables, one for each type of business entity.

These relational databases are good tools for indexing and searching for data, as well as for supporting transactions and performing basic analysis. They are, however, poorly-equipped to dealing with a deluge of data that results in multiple data entries for a single real-world entity.

In this environment of ambiguous data, analysts need to join a number of large tables to run the queries and collect the data for analysis. Such queries could take hours or even days to run, rendering any meaningful analysis of the patterns becomes practically impossible.

TigerGraph for Entity Resolution

Graph Databases Are Ideal for Entity Resolution

Merging customer data from data sources is not always easy. One challenge is the entity resolution, deciding when multiple entities from different data sources actually represent the same real-world entity and then merging them into one entity.

Consider an example where there are three data sources containing the following types of customer information:

  • Source1 (SSN, Email, Address)
  • Source2 (SSN, Phone, Name, Age)
  • Source3 (Email, Phone, Gender)

Let’s assume that SSN, Email, and Phone are each sufficient to uniquely identify an individual (that is, they constitute PII, personally identifiable information). The problem is that the different sources use different identifiers, and that individual records might be missing some information. Over time, missing PII of a customer may show up later in another data source. The goal is to use whatever PII we have about a customer to find all information (attributes) of a customer across all data sources and build a unified record with the following attributes: Customer (SSN, Email, Phone, Name, Age, Gender, Address). 

Graph databases are purpose-built to connect across multiple sources to create a single record. In this case, TigerGraph creates a customer vertex for each customer, connected to various PII vertices such as SSN, Email, Phone. Next, multiple customer entities or vertices with identical SSN, email and phone number are merged with business rules applied to reconcile differing values of fields or attributes such as age and address.

TigerGraph can use the last updated dates for addresses or other rules to populate the address for the consolidated record for the customer vertex U1 and also manage a list of known addresses along with the source information for regulatory compliance such as European Union’s General Data Protection Regulation or GDPR and for corporate information governance.