TigerGraph is delivering the next stage in the evolution of the graph database: the first system capable of real-time analytics on web-scale data. Our Native Parallel Graph™ (NPG) design focuses on both storage and computation, supporting real-time graph updates and offering built-in parallel computation. Our SQL-like graph query language (GSQL) provides for ad-hoc exploration and interactive analysis of Big Data. With GSQL’s expressive capabilities and NPG speed, you’ll be able to perform Deep Link Analytics: uncovering connections that previously were too impractical to reach or too cumbersome to express.
The NPG’s core system was developed from scratch using C++ and system programming concepts to provide an integrated data technology stack. A native graph storage engine (GSE) was developed to co-locate with the graph processing engine (GPE) for fast and efficient processing of data and algorithms.
The GPE is designed to provide built-in parallelism for a MapReduce-based computing model available via APIs. The graph is optimally stored both on disk and in-memory, allowing the system to take advantage of the data locality on disk, in-memory and CPU cache.
The TigerGraph system performs efficient data compression to take further advantage of the memory and CPU cache. The compression ratio, input data size to output graph size, varies with the data and structure; however, a 10x compression ratio is very common. For example, 1TB of input data when transformed and loaded in the graph requires only 100GB of system memory.
Such compression reduces not only the memory footprint, but also cache misses, speeding up overall query performance.
Each vertex and edge in the graph acts as a parallel unit of storage and computation simultaneously.
With this approach, the graph is no longer a static data storage collection; it is a massively parallel computation engine. Vertices can send and receive messages to each other via edges.
A vertex or an edge can store any amount of arbitrary information. The TigerGraph system executes compute functions in parallel on every vertex/edge, taking advantage of multi-core CPU machines and in-memory computing.
The TigerGraph system supports a variety of graph partitioning algorithms. In most cases, automatic partitioning performed on the input data delivers great results without requiring optimization and tuning. But the flexibility in the TigerGraph system allows application-specific and other mixed partitioning strategies to achieve even greater application performance.
The TigerGraph system can also run multiple graph engines as an active-active network. Each graph engine can host identical graphs with different partitioning algorithms tailored for different types of application queries. The front-end server (generally a REST server) can route application queries to different graph engines based on the types of query.
The TigerGraph NPG offers a transformational technology, with significant clear advantages over the most well-known graph database solutions on the market.
Despite its comprehensive and well-documented graph database functionality, the current leading solution is considerably slower in comparison. In benchmark tests, the TigerGraph NPG can load a batch of data in one hour, while the other solution requires a 24-hour day.
Further, by offering parallelism for large scale graph analytics, the NPG supports graph parallel algorithms for Very Large Graphs (VLGs) – providing a considerable technological advantage which grows as graphs inevitably grow larger. The NPG works for limited, fast queries that touch anywhere from a small portion of the graph to millions of vertices and edges, as well as more complex analysis that must touch every single vertex in the graph itself. Additionally, the NPG’s real-time incremental graph updates make it suitable for real time graph analytics unlike other solutions.
As we consider the NPG, an advantage lies in the fact that it represents graphs as a computational model. As discussed previously, compute functions can be associated with each vertex and edge in the graph, transforming them into active parallel compute-storage elements, in a behavior identical to what neurons exhibit in human brains.
Vertices in the graph can exchange messages via edges, facilitating massively parallel and fast computation. The NPG offers a completely new computation paradigm which was absent from previous models, making it poised to become a truly transformational technology.