Evaluating the Leading Graph Databases – TigerGraph and Neo4j for Scalability

As the decade of Graph roars on, it’s important to evaluate each available offering based on objective criteria. I mentioned in the last blog that architecture is an important consideration as you are evaluating the available graph technology from leading vendors.  Today, I will focus on key criteria for scaling up a graph database based on universally accepted enterprise standards and compare the two leading graph databases, TigerGraph and Neo4j.  Here are the criteria to consider for enterprise scalability: 

  • Unified enterprise schema – As graph databases are used for connecting multiple datasets and pipelines, having a unified enterprise schema is an essential requirement for enterprises. 
  • Automatic data partitioning – As the scale of data grows beyond 100 GB,  enterprises add multiple physical or virtual machines or nodes to partition the data for a graph database. The ability to partition the data across multiple machine nodes on-premises or in the cloud is a key requirement, especially for DB administrators in an enterprise. 
  • Distributed querying – As data is partitioned into multiple machine nodes, the ability to query across the nodes, with minimal overhead for engineers is a key requirement. An enterprise-scale graph database must support distributed querying with a single query without having to write individual queries for each data partition. 
  • ACID transactions across the cluster – For an enterprise-scale graph database, ACID transactions must be supported across the cluster to ensure operational use of the database.  This is not a requirement for the analytical use cases, but a must-have for operational deployment of a graph database such as real-time fraud detection for financial transactions.
  • Graph algorithm execution across the cluster – Graph algorithms such as PageRank and community detection identify relationships across business entities and are used in most enterprise-scale deployments of Graph databases. As the relationships can span multiple partitions of data, the ability to run graph algorithms is a must-have for an enterprise-scale graph database.

Here’s a comparison of TigerGraph and Neo4j’s 4.0 Fabric architecture based on the above scalability criteria.  In order to keep this objective, we have included references to Neo4j documentation outlining specific aspects of the architecture involved in this comparison. 

Unified Enterprise Schema

Impact for enterprise deployments (Why does this matter?)

Massive overhead and unwieldy deployments with Neo4j Fabric, especially as the number of machines in a cluster grows (hundreds of schema shards for 1,000 machines). 

Neo4j Source

Automatic Data Partitioning

Impact for enterprise deployments (Why does this matter?)

With TigerGraph, loading to 1000 machines is as simple as loading to a single machine. Single loading job regardless of the number of machines in the cluster.

WIth Neo4j Fabric, the DB administrator must design 100 loading jobs for 100 machines and 1000 loading jobs for 1000 machines.

Distributed Querying

Impact for enterprise deployments (Why does this matter?)

With TigerGraph, the same query works for a single machine, 10, 100, or 1,000 machines. No need to know which machine contains a specific shard of schema or a particular segment of DB.

With Neo4j Fabric, the developer must write a separate query for each machine (as many as 1000 queries for 1000 machines) and another query to stitch it all together. This means as many as 1001 queries for a 1000 machine cluster (one for each node x 1000 nodes + 1 query to join the results). This defeats the entire purpose of using a graph database, as “Query Joins” are required across fragmented schema and data shards.

Neo4j Source

ACID Transactions Across the Cluster

Impact for enterprise deployments (Why does this matter?)

Enterprise graph deployments require ACID transactions across the distributed graph. TigerGraph supports ACID transactions across the cluster of machines for all operational deployments. 

Neo4j Fabric can’t be deployed as an operational graph database for an enterprise as a result of the lack of ACID compliance across the cluster.

Neo4j Source

Graph Algorithm Execution Across the Cluster

Impact for enterprise deployments (Why does this matter?)

Graph algorithms such as community detection and PageRank analyze data across the entire dataset for optimal results. 

TigerGraph supports the execution of graph algorithms across the entire cluster of the distributed graph.

Neo4j Fabric can’t run graph algorithms across the shards, limiting the analytical capabilities severely especially for algorithms that span the entire dataset such as community detection and PageRank. 

Neo4j Source

Summary

TigerGraph is the only native distributed graph database with automatic partitioning. No hassle, high performance and Neo4j 4.0 Fabric is a federation of separate databases.

We welcome your feedback and invite you to download the “Buyer’s Guide for Graph Databases” with more comprehensive information including customer feedback comparing the leading graph databases. 

A vigorous debate is the core of all honest communication and we are looking forward to hearing from you as we learn and grow together in the decade of Graph.

Download the TigerGraph and Neo4j for Scalability Infographic

You Might Also Like