What is a Graph Database and Why Should You Care?
What is a Graph Database?
The world is becoming more connected every day and these connections are more valuable than ever. We’re finding ways to explore connections and relationships to see what they can tell us. And they can tell us a lot: about how individuals are connected to each other forming groups, about the products we are likely to purchase, about how changes in one part of the organization or infrastructure can affect other parts.
There’s a technology that was designed to manage these relationships and It’s called Graph Database and Analytics. Graph database is the only data model where the business entities and their relationships are pre-connected. Graph Analytics offers a simplified way to analyze relationships among the entities such as people, products, accounts, and locations using SQL-like queries that do not require programming experts.
Gartner estimates that the Graph Database and Analytics market will grow 100% annually from 2019 to 2022, making it one of the fastest growing markets in data and analytics landscape.
You Use Graphs Every Day
Every day when you search using Google, you are using Google’s Knowledge Graph. Google search returns the web pages that contain the information you are looking for & they are ranked using a Graph algorithm called PageRank.
Every week when you login, search and view business contacts on LinkedIn, it shows the degrees of separation from a business contact such as 1st, 2nd, 3rd-degree connection. This is the result of a graph database search on LinkedIn’s Professional Network Graph indicating the number of hops from you to the contact being reviewed. Every time you see common connections or common groups with a second-degree contact, or LinkedIn recommends you to connect with a professional contact, you are querying the professional network graph at LinkedIn.
Every time you use Amazon or wish.com to shop, you see product recommendations such as “people who bought this item also bought” or “these items are often bought together”. That comes from a graph analytics query.
Every time you use Amazon, Twitter, Facebook or Instagram, you are using graph database and analytics. Why aren’t these industry leaders using relational or NoSQL databases for storing and analyzing the data regarding relationships?
Challenges Using Relational or NoSQL Databases for Storing and Analyzing Relationships
Relational databases store the data for each business entity such as customer, order, product and payment data in separate database tables. In order to understand and analyze relationships across the business entities, relational databases require table joins, which can take hours and are computationally expensive as the size of the data grows.
NoSQL databases store all of the data in a single table. This means that the relationship analysis requires scanning a huge table with millions or billions of rows, making it very difficult to perform a deeper analysis of the relationships beyond two or three levels.
Graph databases are purpose-built for storing and analyzing relationships among the data, as the data entities as well as the relationships among them are pre-connected and do not require time-consuming table joins or multiple scans across a large table.
With the inherent advantages for graph databases for managing the relationship data, it begs the next question – “why have enterprises not adopted graphs faster”?
Enterprise Adoption for Graph Databases
First generation graph databases were built with native graph storage, however, were not built to handle large data or query volumes or perform beyond three levels or connections inside the graph. They are excellent for visualizing relationships among business entities but fail to go beyond proof of concept or academic research projects to scale up to the real-world requirements.
Second generation graph databases were built on top of NoSQL storage, which allowed them to load large amounts of data. However, they still do not scale for queries involving three or more connections or hops and can’t support complex graph analytics for analyzing the relationships. They also typically do not support database sharding which means, a large graph with terabytes of data can’t be distributed into multiple servers, each with few hundreds of gigabytes of data.
First and second generation graph databases do not meet enterprise requirements:
- Can’t scale to multiple machines for storing big data (database sharding) and parallel query processing
- Can’t support deep link analytics (go beyond three hops) essential for next-generation fraud detection, recommendation engine, Machine Learning & AI and other use cases
- Unable to meet real-time requirements for updates and sub-second query performance on big data
Here’s a blog from a customer covering challenges associated with first and second generation graph databases as well as a customer interview which goes over challenges associated with the older graph databases.
TigerGraph is a new kind of graph database, a native parallel graph database purpose-built for loading massive amounts of data (terabytes) in hours and analyzing as many as 10 or more hops deep into relationships in real-time. TigerGraph supports transactional as well as analytical workloads, is ACID compliant, scales up and out with database sharding.
TigerGraph’s proven technology supports applications such as fraud detection, customer 360, IoT, AI, and machine learning to make sense of ever-changing big data and is used by customers including Intuit, China Mobile, Wish and Zillow.