Skip to content
START FOR FREE
START FOR FREE
  • SUPPORT
  • COMMUNITY
Menu
  • SUPPORT
  • COMMUNITY
MENUMENU
  • Products
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      TIGERGRAPH CLOUD

      • Overview
      • TigerGraph Cloud Suite
      • FAQ
      • Pricing

      USER TOOLS

      • GraphStudio
      • Insights
      • Application Workbenches
      • Connectors and Drivers
      • Starter Kits
      • openCypher Support

      TIGERGRAPH DB

      • Overview
      • GSQL Query Language
      • Compare Editions

      GRAPH DATA SCIENCE

      • Graph Data Science Library
      • Machine Learning Workbench
  • Solutions
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      Solutions

      • Solutions Overview

      INCREASE REVENUE

      • Customer Journey/360
      • Product Marketing
      • Entity Resolution
      • Recommendation Engine

      MANAGE RISK

      • Fraud Detection
      • Anti-Money Laundering
      • Threat Detection
      • Risk Monitoring

      IMPROVE OPERATIONS

      • Supply Chain Analysis
      • Energy Management
      • Network Optimization

      By Industry

      • Advertising, Media & Entertainment
      • Financial Services
      • Healthcare & Life Sciences

      FOUNDATIONAL

      • AI & Machine Learning
      • Time Series Analysis
      • Geospatial Analysis
  • Customers
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      CUSTOMER SUCCESS STORIES

      • Ford
      • Intuit
      • JPMorgan Chase
      • READ MORE SUCCESS STORIES
      • Jaguar Land Rover
      • United Health Group
      • Xbox
  • Partners
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      PARTNER PROGRAM

      • Partner Benefits
      • TigerGraph Partners
      • Sign Up
      TigerGraph partners with organizations that offer complementary technology solutions and services.​
  • Resources
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      BLOG

      • TigerGraph Blog

      RESOURCES

      • Resource Library
      • Benchmarks
      • Demos
      • O'Reilly Graph + ML Book

      EVENTS & WEBINARS

      • Graph+AI Summit
      • Graph for All - Million Dollar Challenge
      • Events &Trade Shows
      • Webinars

      DEVELOPERS

      • Documentation
      • Ecosystem
      • Developers Hub
      • Community Forum

      SUPPORT

      • Contact Support
      • Production Guidelines

      EDUCATION

      • Training & Certifications
  • Company
    • Join the World’s Fastest and Most Scalable Graph Platform

      WE ARE HIRING

      COMPANY

      • Company Overview
      • Leadership
      • Legal Terms
      • Patents
      • Security and Compliance

      CAREERS

      • Join Us
      • Open Positions

      AWARDS

      • Awards and Recognition
      • Leader in Forrester Wave
      • Gartner Research

      PRESS RELEASE

      • Read All Press Releases
      TigerGraph Recognized in 2022 Gartner® Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases
      January 12, 2023
      Read More »

      NEWS

      • Read All News

      A Shock to the System: ShockNet Predicts How Economic Shocks Could Affect the World Economy

      TigerGraph Recognized for the First Time in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems

  • START FREE
    • The World’s Fastest and Most Scalable Graph Platform

      GET STARTED

      • Request a Demo
      • CONTACT US
      • Try TigerGraph
      • START FREE
      • TRY AN ONLINE DEMO

Crossing the Chasm – Eight Prerequisites For A Graph Query Language

  • Mingxi Wu
  • May 22, 2018
  • About TigerGraph, blog, Graph Databases, GSQL
  • Blog >
  • Crossing the Chasm – Eight Prerequisites For A Graph Query Language

Prelude

In December, I wrote a Quora post on the pros and cons of graph databases. I shared two cons pervasive in the market today: the difficulty of finding proficient graph developers, and how non-standardization on a graph query language is slowing down enterprise adoption, especially where it is needed most.

A solution to these two problems is availability of a simple, yet powerful standard graph query language. There are many graph languages on the market, including Neo4j’s Cypher, Apache TinkerPop Gremlin and TigerGraph’s GSQL, to name a few. Before discussing which graph language is the best, or fusing the best aspects of each graph language into a new, unified option, let us take a step back to ask a more fundamental question: What are the prerequisites for a graph query language in the first place?

Prerequisites For A Graph Language

At first sight, it is a hard question to answer! Based on the input from many users, and on my more than 18-year career journey in data management, I will attempt to answer this using a series of connected questions.

First, Why Does The World Pay Attention To The Graph Model?

Intel introduced the 8008 processor on April 1, 1972. This was followed by the availability of the first personal computers, introduced in 1975. During this era, more and more enterprise information became digitized. A pressing requirement of the market was an easy to use data management application to help with bookkeeping, and to generate ad hoc reports for business communication. Due to the scarcity of memory, the disk-based relational database was born, riding on the normalization theory to reduce data redundancy and improve data integrity. The relational model (or tabular model) met the hardware constraint and market requirements, and thus flourished over the next three decades. At that time, QUEL and SQL were two competing relational query languages. Both languages are expressive and easy to use, and they satisfied the market requirements. As the market evolved, SQL eventually won and became widely adopted by the industry.

Time flew by. Hardware continued to evolve while data volume increased at a rapid pace. The market demanded scalable software to manage this ever-growing data. Since 2000, many Massively Parallel Processing (MPP) database vendors have gained ground by addressing this requirement — these include Teradata, Greenplum, Netezza, Vertica, AsterData, Redshift, etc. These MPP databases still adhere to the relational model but use MPP to solve the scalability problem. The success of MPP databases can be attributed to SQL.Thanks to SQL’s declarative nature, applications written in SQL are data-independent, and thus can be ported to single-node or multiple-node RDBMS systems with ease.

The modern era is witnessing the data deluge and the interconnected data. The uniqueness and power of interconnected data is proven by the emergence of giant internet companies, including Google (search engine), Facebook (social network), LinkedIn (professional network), and Twitter (online news and social graph). Nowadays, mobile messengers such as Instagram, WhatsApp and WeChat, and mobile payment providers such as Paypal, WeChat Pay and Alipay generate enormous volumes of graph data every second. As people, machines, cars, mobile devices, etc. generate colossal amounts of interconnected data, we are truly in the midst of the network age! A pressing demand for managing graph data has emerged.

For the first time in the history of data management, the relational model no longer works, because relational databases cannot efficiently join multiple tables in order to traverse multiple links in the interconnected data. Typically, the relational model blindly scans all of or large portions of the participating tables, and uses the join predicates to find data record connections. One can argue that an index (an auxiliary data structure to speed up ad hoc record lookup) can help. However, a savvy database administrator will be happy to tell you that an index is a redundant copy of a small portion of the relational table, and if the data are changing frequently, it is a big challenge, if not infeasible, to keep the updated tables and indices in sync without sacrificing query performance.

More importantly, it is not even possible to write SQL queries to discover and predict relationships hidden in many tables– we cannot “connect the dots” by joining the tables when we don’t know for sure in advance what types of relationships may be involved.

The graph model comes to the rescue by offering these beneficial features:

  • Natural and economic storage model for interconnected data.
  • Natural model for managing ever-growing object types.
  • Natural analytical computational model for knowledge/inference/learning – chaining and combining observations.

Why does the graph model need a new query language? 

As discussed so far, the graph model is a perfect fit for today’s interconnected data. However, because it is new, and graph-oriented thinking is different, it takes some effort for developers to learn and use it with ease. Most computer science textbooks today will dedicate a special section on graph algorithms, with education on the graph model and graph algorithms oriented around appreciating the theoretical complexity and elegance of graph problems. Coding graph algorithms is another matter, with textbooks using general purpose programming language such as Java or C++. Furthermore, most graph algorithms in textbooks typically assume all graph data can fit in memory and don’t consider distributed computing. Consequently, there is a shortage of practical training and tools for real-world graph problems especially for big graph datasets.

For any enterprise adopting a graph model, a top concern is where do they find qualified developers to efficiently write graph applications?

The solution? Lowering the barrier to learning and implementing graph applications with – A Declarative Graph Query Language! Just as SQL and QUEL significantly lowered the barrier for managing relational data, a well-designed, user-friendly, and highly expressive graph query language can significantly bridge the gap between asking high-level real-life questions and easily crafting a graph-based solution.

What Are The Prerequisites Of A Graph Query Language?

That being said, based on years of experience working on real-world graph management problems from our forward thinking customers, I have summarized the key requirements for solving their problems, all of which serve as an empirical answer to our question What Are The Prerequisites Of A Graph Query Language?

  1. Schema-Based with Capability of Dynamic Schema Change.

As explained before, the success of SQL for relational database programming is largely attributed to data independence, where the user works with a logic level data schema that is independent of the underlying storage mechanism, and changes to the logic level data schema will be transparent to existing applications (so they won’t be broken). With data independence, all the lower level changes are transparent to the upper level. Applications written on a pre-postulated schema (data model) can be ported to any data management software and hardware that understand the schema. By the same token, a graph model and graph query language should embrace data independence. To be more precise, a graph query language should support the definition of logical graph schemas. Because data models change over time in the real world, the language and system should also support dynamic schema changes.

  1. High-Level Control of Graph Traversal

The SQL language is purely declarative, meaning users ask questions without worrying how the database software processes the queries. This relieves the user from coding a high-performance algorithm to perform their task. The same fashion can be applied to designing an elegant graph query language. For simple queries, it can be declarative, such as a pattern match query. E.g., this sample query below finds all customers whose age is greater than 23 and who has bought a product produced by Apple and also purchased by Peter.

  1. Fine Control of Graph Traversal

In addition to high-level queries, we see large customer demand for coding sophisticated iterative graph mining algorithms and traversal algorithms, where they want to traverse the graph data, add tags and side effects (runtime attributes), with iterative updates until a final query result is computed. For example, consider algorithms for collaborative filtering recommendation, PageRank, community detection, label propagation, similarity-based ranking, etc. See https://doc.tigergraph.com/GSQL-Tutorial-and-Demo-Examples.html. Based on our experience serving some of the largest companies in the world, customers come to graph management to design and execute complex algorithms that SQL cannot solve. Therefore, fine control of both the the graph query and the graph model’s elements – vertex and edge instances – is a must for a standard graph query language. For example, consider a PageRank query.

  1. Built-in Parallel Semantics to Guarantee High Performance

Graph algorithms are expensive, as each hop could potentially increase the data complexity exponentially. Let’s consider how: as we start from one vertex, the first hop can yield n neighbors, the next hop on the graph from the n immediate neighbors can yield n² more neighbors, and so on. As a result, each traversal step in the graph should have built-in parallel processing semantics to ensure high performance.

  1. A Highly Expressive Loading Language

The world is a graph. Ingesting data silos into a central graph schema needs an expressive and flexible schema-mapping loading language. Otherwise, even integrating a couple of data sources into the logical graph schema will be a daunting task. A comprehensive graph language needs to include an easy-to-use loading functionality to help onboard heterogeneous data quickly. This is one of the highest priority requirements for a graph query language and associated system.

  1. Data Security and Privacy

Enterprise customers are keen on facilitating their own collaboration with graph data. On one hand, they want a graph model that can share selected parts of the data among multiple departments and even among partners. At the same time, they also want to keep certain sensitive data private and accessible only by specific roles, departments and partners, based on their business need. A MultiGraph with a role-based security model is required for a successful graph query language in the real world for large enterprises.

  1. Support For Queries Calling Queries (recursively)

This is a less obvious requirement in terms of implementing a graph-based solution that generates business value. Quite often, our customers want to ask the same query for all vertices in the graph, the so-called batch processing. For example, for a product recommendation problem, they want to find recommendations for each and every customer. To write a batch query like this in one graph query is much harder than writing a single customer recommendation query because the data structures and algorithmic flow are much more complex. To solve this problem, a query-calling-query feature comes to rescue. In the main query, for each vertex we call the single customer recommendation query. The same pattern can be applied for other complex graph batch processing, where divide-and-conquer makes the graph query logic simpler and resource-wise more scalable. This also enables users to write and organize queries in more modularized way.

  1. SQL User-Friendly

We found most of our customers have SQL developers. The graph query language, therefore should stay close to SQL syntax and concepts as much as possible so that relational SQL developers can learn and implement graph applications with minimal ramp-up time.

How Does TigerGraph GSQL Address These Eight Prerequisites?

Based on customer feedback over five years, TigerGraph’s world-class team has co-invented with our customers, over several iterations, a Turing-complete graph query language called GSQL. It includes DDL (Data Definition Language) commands for defining a graph schema, a loading language for high-performance loading of structured and semi-structured data, and a SQL-like graph traverse and compute language with implicit parallelism.

  1. Schema-Based with Capability of Dynamic Schema Change.

GSQL uses Vertex type and Edge type as schema building blocks. A Graph is defined as a collection of Vertex types and Edge types. One TigerGraph database may contain multiple (possibly overlapping) graphs, each with its own set of user privileges and queries. E.g.,

Also, GSQL’s DDL supports dynamic schema change with SQL-like syntax. This is a must-have requirement commonplace in managing enterprise data today. With the DDL, all GSQL queries are written against the logic model. Some of our customers started with a single machine, and as their data size grows, they migrate to multiple machines leveraging our MPP graph database to scale out performance and storage. Thanks to our data independence query language, their applications are transparent to the scale out architecture change, and migrated seamlessly from single node to a cluster of nodes.

  1. High-Level Control of Graph Traversal

GSQL’s building block is a single SQL-like block with a one-hop pattern. For example, “find all customers who are 23 or more years old and who bought an iPhone” is expressed as follows:

GSQL users can write a series of single GSQL blocks to traverse the graph any number of steps, each step being a one-hop path pattern. In other words,  it is very high-level declarative traversal, simple and clear. It is easy to read and maintain. Our customers without computer science degrees can master it in one or two days and start coding graph algorithms on their domain data.

  1. Fine Control of Graph Traversal

GSQL introduces a set of built-in accumulators, which serve as the vertices’ runtime attributes (a.k.a. properties). These runtime attributes can attach to each and every vertex instance seen during traversal in parallel, thus users can add tags or side effects on vertices during runtime traversal to gather evidences for complex applications such as fraud detection and personalized recommendations. GSQL also has flow control components such as WHILE loops, FOREACH loops, and IF-ELSE/CASE-WHEN branching to allow complex iterative or branching algorithms. Together with local and global accumulators, GSQL really puts the user’s hands on the steering wheel. Indeed, it’s Turing-complete, meaning any algorithm on the graph data can be coded within GSQL, therefore no graph data are needed to be exchanged back and forth between a graph database system and a client-side application which can slows the query performance orders of magnitudes.

  1. Built-in Parallel Semantics to Guarantee High Performance

A pair of innovative ACCUM and POST-ACCUM clauses in combination with built-in accumulator types encode the parallel processing of a vertex set or edge set, within the same blocks used for high-level graph traversal. This means GSQL enables  users to achieve parallel processing of graph data very easily. This is a great advantage of the graph query language to encourage users with parallel thinking without worrying about the implementation details.  Of course, not every graph query needs this powerful feature. PageRank and community detection types of algorithms can take advantage of this feature.

  1. A Highly Expressive Loading Language

The loading language is high expressive and declarative. Our users love it. We see customers learning it in half an hour and use it to integrate 10s to 100s data sources into a single graph model in one day.

  1. Security for Data Sharing and Privacy At The Same Time

The industry’s first multiple-graph model was born in GSQL and helps our customers to achieve data sharing and privacy at the same time.

  1. Support For Queries Calling Queries 

A GSQL can call a query in several places: in the ACCUM and POST-ACCUM clauses, at the statement level, and even in the called query body itself. This gives the most flexibility and simplicity for solving real-life graph traversal problems. A query can also call itself recursively. This feature enables divide-and-conquer, greatly simplify query logics and encourage writing composable queries.

  1. SQL User-Friendly

GSQL re-uses as much as possible of SQL syntax, semantics, and keywords. GSQL can in fact be seen and learned as a chain of single SQL blocks.

To learn more, check out our white paper offering some high level comparison of GSQL with two other graph query language — Gremlin and Cypher.

Got questions? Reach out to us at [email protected].

For those of you going to SIGMOD 2018 (June 10-15, Houston, Texas), I’d love to chat. Stop by TigerGraph’s booth #21  and ask any questions regarding our graph query language GSQL.

 

 

 

 

You Might Also Like

TigerGraph Showcases Unrivaled Performance at Scale

TigerGraph Showcases Unrivaled Performance at Scale

January 12, 2023
How to Create a Visual Graph Analytics Application Using TigerGraph Insights in 30 mins

How to Create a Visual Graph...

November 14, 2022
Turbocharge your business intelligence with TigerGraph’s ML Workbench on TigerGraph Cloud

Turbocharge your business intelligence with TigerGraph’s...

November 14, 2022

Introducing TigerGraph 3.0

July 1, 2020

Everything to Know to Pass your TigerGraph Certification Test

June 24, 2020

Neo4j 4.0 Fabric – A Look Behind the Curtain

February 7, 2020

TigerGraph Blog

  • Categories
    • blogs
      • About TigerGraph
      • Benchmark
      • Business
      • Community
      • Compliance
      • Customer
      • Customer 360
      • Cybersecurity
      • Developers
      • Digital Twin
      • eCommerce
      • Emerging Use Cases
      • Entity Resolution
      • Finance
      • Fraud / Anti-Money Laundering
      • GQL
      • Graph Database Market
      • Graph Databases
      • GSQL
      • Healthcare
      • Machine Learning / AI
      • Podcast
      • Supply Chain
      • TigerGraph
      • TigerGraph Cloud
    • Graph AI On Demand
      • Analysts and Research
      • Customer 360 and Entity Resolution
      • Customer Spotlight
      • Development
      • Finance, Banking, Insurance
      • Keynote
      • Session
    • Video
  • Recent Posts

    • It’s Time to Harness the Power of Graph Technology [Infographic]
    • TigerGraph Showcases Unrivaled Performance at Scale
    • TigerGraph 101 An Introduction to Graph | Jan 26th @ 9am PST
    • Data Science Salon New York
    • Tech For Retail
    TigerGraph

    Product

    SOLUTIONS

    customers

    RESOURCES

    start for free

    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    GRAPH DATA SCIENCE
    • Graph Data Science Library
    • Machine Learning Workbench
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    • Login
    • FAQ
    • Pricing
    • Cloud Marketplaces
    USEr TOOLS
    • GraphStudio
    • TigerGraph Insights
    • Application Workbenches
    • Connectors and Drivers
    • Starter Kits
    • openCypher Support
    SOLUTIONS
    • Why Graph?
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer 360/MDM
    • Recommendation Engine
    • Anti-Money Laundering
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network & IT Management
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    success stories
    • Customer Success Stories

    Partners

    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    LIBRARY
    • Resources
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Million Dollar Challenge
    EDUCATION
    • Training & Certifications
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem

    COMPANY

    Company
    • Overview
    • Careers
    • News
    • Press Release
    • Awards
    • Legal
    • Patents
    • Security and Compliance
    • Contact
    Get Started
    • Start Free
    • Compare Editions
    • Online Demo - Test Drive
    • Request a Demo

    Product

    • Overview
    • TigerGraph 3.0
    • TIGERGRAPH DB
    • TIGERGRAPH CLOUD
    • GRAPHSTUDIO
    • TRY NOW

    customers

    • success stories

    RESOURCES

    • LIBRARY
    • Events
    • EDUCATION
    • BLOG
    • DEVELOPERS

    SOLUTIONS

    • SOLUTIONS
    • use cases
    • industry

    Partners

    • partner program

    company

    • Overview
    • news
    • Press Release
    • Awards

    start for free

    • Request Demo
    • take a test drive
    • SUPPORT
    • COMMUNITY
    • CONTACT
    • Copyright © 2023 TigerGraph
    • Privacy Policy
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph | Privacy Policy

    Copyright © 2020 TigerGraph Privacy Policy

    • SUPPORT
    • COMMUNITY
    • COMPANY
    • CONTACT
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph

    Privacy Policy

    • Products
    • Solutions
    • Customers
    • Partners
    • Resources
    • Company
    • START FREE
    START FOR FREE
    START FOR FREE
    TigerGraph
    PRODUCT
    PRODUCT
    • Overview
    • GraphStudio UI
    • Graph Data Science Library
    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    TRY TIGERGRAPH
    • Get Started for Free
    • Compare Editions
    SOLUTIONS
    SOLUTIONS
    • Why Graph?
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer Journey/360
    • Recommendation Engine
    • Anti-Money Laundering (AML)
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network Resources Optimization
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    CUSTOMERS
    read all success stories

     

    PARTNERS
    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    RESOURCES
    LIBRARY
    • Resource Library
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Graph for All - Million Dollar Challenge
    EDUCATION
    • TigerGraph Academy
    • Certification
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem
    COMPANY
    COMPANY
    • Overview
    • Leadership
    • Careers  
    NEWS
    PRESS RELEASE
    AWARDS
    START FREE
    Start Free
    • Request a Demo
    • SUPPORT
    • COMMUNITY
    • CONTACT
    Dr. Jay Yu

    Dr. Jay Yu | VP of Product and Innovation

    Dr. Jay Yu is the VP of Product and Innovation at TigerGraph, responsible for driving product strategy and roadmap, as well as fostering innovation in graph database engine and graph solutions. He is a proven hands-on full-stack innovator, strategic thinker, leader, and evangelist for new technology and product, with 25+ years of industry experience ranging from highly scalable distributed database engine company (Teradata), B2B e-commerce services startup, to consumer-facing financial applications company (Intuit). He received his PhD from the University of Wisconsin - Madison, where he specialized in large scale parallel database systems

    Todd Blaschka | COO

    Todd Blaschka is a veteran in the enterprise software industry. He is passionate about creating entirely new segments in data, analytics and AI, with the distinction of establishing graph analytics as a Gartner Top 10 Data & Analytics trend two years in a row. By fervently focusing on critical industry and customer challenges, the companies under Todd's leadership have delivered significant quantifiable results to the largest brands in the world through channel and solution sales approach. Prior to TigerGraph, Todd led go to market and customer experience functions at Clustrix (acquired by MariaDB), Dataguise and IBM.