Let’s assume that SSN, Email, and Phone are each sufficient to uniquely identify an individual (that is, they constitute PII, personally identifiable information). The problem is that the different sources use different identifiers, and that individual records might be missing some information. Over time, missing PII of a customer may show up later in another data source. The goal is to use whatever PII we have about a customer to find all information (attributes) of a customer across all data sources and build a unified record with the following attributes: Customer (SSN, Email, Phone, Name, Age, Gender, Address).
Graph databases are purpose-built to connect across multiple sources to create a single record. In this case, TigerGraph creates a customer vertex for each customer, connected to various PII vertices such as SSN, Email, Phone. Next, multiple customer entities or vertices with identical SSN, email and phone number are merged with business rules applied to reconcile differing values of fields or attributes such as age and address.
TigerGraph can use the last updated dates for addresses or other rules to populate the address for the consolidated record for the customer vertex U1 and also manage a list of known addresses along with the source information for regulatory compliance such as European Union’s General Data Protection Regulation or GDPR and for corporate information governance.