Entity Resolution
In a world where the same individual might appear as “Alex J.” in a CRM, “A. Johnson” in a transaction log, and “AJ-459” in a support system, Entity Resolution (ER) is what connects the dots. Entity resolution is the process of identifying when different data records—spread across systems, departments, or even geographies—refer to the same real-world person, organization, device, or product, and then merging or linking the records together.
Modern ER isn’t simply cleaning duplicates. It’s stitching together identity through patterns, behavior, and relationships. It’s what allows an airline to unify a customer’s loyalty history, a hospital to consolidate patient care records, and a bank to detect fraud hidden across disconnected accounts.
When powered by graph technology, ER becomes far more dynamic and accurate, uncovering connections that are missed by linear or rules-based systems. And when implemented in TigerGraph, ER scales to billions of data points with the ability to adapt as entities evolve in real time.
Common Misconceptions
Many still view entity resolution as a technical housekeeping task—just another pass at deduplication. In reality, this view misses both the complexity and the strategic potential of ER.
Where traditional methods might compare names or IDs in isolation, real ER leverages context, relationships, and time. It isn’t just about matching records; it’s about understanding how entities relate and behave within a network.
Another misconception is thinking of ER as a “set-it-and-forget-it” operation. In practice, entity resolution is continuous and recursive—as new data comes in, identities evolve, associations shift, and new insights emerge.
Graph-native systems like TigerGraph are built to support this kind of dynamic reasoning across large-scale, constantly changing data.
Definition
Entity Resolution is the process of identifying and linking records that refer to the same real-world entity, despite inconsistencies, missing data, or intentional obfuscation. The need for this capability spans across industries—from unifying customer profiles to catching fraud rings to consolidating medical records for safe treatment.
Unlike traditional matching, graph-based ER analyzes both attributes (like name, address, email) and multi-hop relationships (such as shared devices, social links, or purchase history) to build context-rich profiles that improve with every new data point.
In TigerGraph, Entity Resolution is graph-native, modeling identities through connected attributes, behaviors, and shared relationships across time. Unlike traditional systems that match flat records, TigerGraph builds dynamic identity graphs, capturing signals such as shared addresses, devices, or transactions in real time. This allows organizations to resolve identities contextually and at scale, integrating directly with fraud detection, personalization, and compliance workflows.
BasicConcepts
At its heart, entity resolution answers the question: “Who—or what—are we really dealing with?” That’s a harder question than it seems, especially when data is fragmented across siloed systems or intentionally manipulated to avoid detection.
At its core, ER involves:
- Data Normalization: The consistency and comparability of attributes is improved with standardized input fields across systems and formats, such as date formats, phone numbers, or address conventions.
- Attribute Matching: Near-matches are detected even when entries are inconsistent or noisy with structured field comparisons. Names, emails, or addresses are resolved using similarity techniques like Levenshtein distance (edit distance), Jaccard index (set overlap), or cosine similarity (vector angle).
- Blocking and Indexing (Record Linkage): Before matching, candidate record sets are narrowed using heuristics or pre-indexed keys to reduce unnecessary comparisons. For instance, records might be grouped by ZIP code or hashed email prefix, data where typos are not uncommon. Resolving these discrepancies is precisely what ER seeks to overcome.
- Scoring Logic or ML Models: Matches are evaluated using rule-based thresholds or predictive models. Scoring functions assign confidence levels to candidate pairs, while machine learning models can incorporate historical match outcomes to refine accuracy.
- Clustering: Once matches are scored, clustering techniques group together all the records believed to represent the same entity. These groups are treated as resolved identities. Clustering may be rule-based or algorithm-driven (e.g., connected components, hierarchical clustering).
How TigerGraph Enhances Entity Resolution
TigerGraph makes ER more scalable and context-aware by natively incorporating relationship intelligence into the resolution process. While traditional ER systems rely heavily on attribute similarity, TigerGraph enables entity resolution that reasons across multi-hop connections and evolving networks of data.
Key graph-native capabilities include:
- Relationship-Based Reasoning: In addition to direct field comparison, TigerGraph uses relationship patterns—such as overlapping addresses, common device IDs, or shared transactions—to surface indirect links between entities. These connection patterns provide contextual evidence that attribute-only approaches miss.
- Graph Clustering Algorithms: TigerGraph supports built-in graph clustering techniques like Louvain, label propagation, and connected components, which group entities based on degree and pattern of connectedness. This is critical when identity fragments are distributed across systems or time periods.
- Graph-Based Blocking: TigerGraph enables more intelligent blocking strategies based on structural similarity, such as shared neighbors or label proximity, rather than static keys alone.
- Graph-enhanced ML: graph structural features can be included in the data used to train a ML model for predicting entity matches, for improved accuracy over non-graph models. Graph neutral networks (GNNs) enable structural information to be directly and holistically integrated into the model.
Distinctions from Related Concepts
- Entity Resolution vs. Deduplication: Deduplication identifies and removes exact or near-duplicate records. ER goes further, resolving partially overlapping records that represent the same entity, even when they differ significantly in format, source, or timestamp.
- Entity Resolution vs. Identity Matching: Identity matching typically involves one-to-one comparisons (e.g., “Is this record a match for this known profile?”). ER performs many-to-many reasoning across full graphs of data to assemble a holistic view of an entity from multiple pieces.
Real-World Applications
Anti-Money Laundering (AML) and Fraud Detection
ER helps uncover hidden relationships between entities, such as multiple shell companies using the same address, or accounts linked by shared devices and behavioral patterns. In fraud cases, graph-based ER detects collusion, identity fraud, and synthetic identities earlier than rules-based systems.
Customer 360 and Personalization
Whether in banking, retail, or telecom, customer data is often fragmented. ER connects marketing interactions, purchases, support tickets, and behavioral data into a unified view, enabling personalized service, smarter cross-sell, and better churn prediction.
Healthcare and Life Sciences
Patient records may vary across providers, insurers, labs, and apps. ER ensures that “Robert Q. Li” in the EHR and “R. Li” in a pharmacy database are treated as one patient, improving care coordination and regulatory compliance. Although this sounds easily avoidable, misidentification is a persistent, avoidable concern in healthcare.
Insurance and Claims Management
Multiple claims, agents, or policies can obscure who’s who. ER identifies duplicate claimants, shared risk profiles, and coordinated fraud networks across regions or insurers.
Marketplace and Platform Security
Gig platforms, online marketplaces, and social networks use ER to detect fake accounts, identity fraud, and duplicate profiles, especially in high-volume, loosely verified environments.
Related Terms
- Knowledge Graph: Provides a semantic structure to support ER through contextual understanding of entities and relationships.
- Similarity Search: Often vector-based, used in hybrid ER systems to enhance recall and accuracy.
- Graph Clustering: Groups related nodes to uncover unified identities or behavior patterns.
- Data Matching: Foundational logic for comparing records; often the first step in ER workflows.
- Anomaly Detection: Uses entity-level graph patterns to flag suspicious behavior or structural deviations.