Skip to content
START FOR FREE
START FOR FREE
  • SUPPORT
  • COMMUNITY
Menu
  • SUPPORT
  • COMMUNITY
MENUMENU
  • Products
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      TIGERGRAPH CLOUD

      • Overview
      • TigerGraph Cloud Suite
      • FAQ
      • Pricing

      USER TOOLS

      • GraphStudio
      • Insights
      • Application Workbenches
      • Connectors and Drivers
      • Starter Kits
      • openCypher Support

      TIGERGRAPH DB

      • Overview
      • GSQL Query Language
      • Compare Editions

      GRAPH DATA SCIENCE

      • Graph Data Science Library
      • Machine Learning Workbench
  • Solutions
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      Solutions

      • Solutions Overview

      INCREASE REVENUE

      • Customer Journey/360
      • Product Marketing
      • Entity Resolution
      • Recommendation Engine

      MANAGE RISK

      • Fraud Detection
      • Anti-Money Laundering
      • Threat Detection
      • Risk Monitoring

      IMPROVE OPERATIONS

      • Supply Chain Analysis
      • Energy Management
      • Network Optimization

      By Industry

      • Advertising, Media & Entertainment
      • Financial Services
      • Healthcare & Life Sciences

      FOUNDATIONAL

      • AI & Machine Learning
      • Time Series Analysis
      • Geospatial Analysis
  • Customers
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      CUSTOMER SUCCESS STORIES

      • Ford
      • Intuit
      • JPMorgan Chase
      • READ MORE SUCCESS STORIES
      • Jaguar Land Rover
      • United Health Group
      • Xbox
  • Partners
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      PARTNER PROGRAM

      • Partner Benefits
      • TigerGraph Partners
      • Sign Up
      TigerGraph partners with organizations that offer complementary technology solutions and services.​
  • Resources
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      BLOG

      • TigerGraph Blog

      RESOURCES

      • Resource Library
      • Benchmarks
      • Demos
      • O'Reilly Graph + ML Book

      EVENTS & WEBINARS

      • Graph+AI Summit
      • Graph for All - Million Dollar Challenge
      • Events &Trade Shows
      • Webinars

      DEVELOPERS

      • Documentation
      • Ecosystem
      • Developers Hub
      • Community Forum

      SUPPORT

      • Contact Support
      • Production Guidelines

      EDUCATION

      • Training & Certifications
  • Company
    • Join the World’s Fastest and Most Scalable Graph Platform

      WE ARE HIRING

      COMPANY

      • Company Overview
      • Leadership
      • Legal Terms
      • Patents
      • Security and Compliance

      CAREERS

      • Join Us
      • Open Positions

      AWARDS

      • Awards and Recognition
      • Leader in Forrester Wave
      • Gartner Research

      PRESS RELEASE

      • Read All Press Releases
      TigerGraph Recognized in 2022 Gartner® Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases
      January 12, 2023
      Read More »

      NEWS

      • Read All News

      A Shock to the System: ShockNet Predicts How Economic Shocks Could Affect the World Economy

      TigerGraph Recognized for the First Time in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems

  • START FREE
    • The World’s Fastest and Most Scalable Graph Platform

      GET STARTED

      • Request a Demo
      • CONTACT US
      • Try TigerGraph
      • START FREE
      • TRY AN ONLINE DEMO

Graph Query Searches

  • Emily McAuliffe
  • June 15, 2020
  • blog, Developers
  • Blog >
  • Graph Query Searches

This article was originally posted on Towards Data Science and is linked to a series, read Part 1 and Part 2. Follow the author, Akash Kaul, on LinkedIn and Medium.

Modeling Publication Data in a Graph

This is part 3 of a series exploring data extraction and modeling. If you’ve followed along with the series so far, welcome back! If you’re new, I’ll give a brief rundown of what we’ve done so far. In Part 1, we explored using NLP and entity extraction on biomedical literature related to Covid-19. In Part 2, we learned how to take our data and model it using TigerGraph Cloud. Check out those articles to get an in-depth look at how we’ve done everything so far.

Now in Part 3, we are going to look at writing graph search queries to easily analyze the data in our graph. You will need a fully formed graph to write queries. You can follow the steps in Part 2 to create a graph from scratch, or you can import the graph we created into TigerGraph Cloud. The graph, along with all of the data we used, can be found here.

To import the graph, create a blank solution following these steps. Once on the homepage of your solution, click Import an Existing Solution.

Import Solution on Homepage

Unfortunately, you still have to map and load the data manually. But, I take you through exactly how to do this in Part 2 under agenda items 4 and 5.

What Are Graph Queries?

Before we start writing queries, we should probably understand what they are. Graph queries are essentially commands that search through a graph and perform some operation. Queries can be used to find certain vertices or edges, do computations, or even update the graph. Since graphs also have visual representations, all of this can also be done with a UI, like the one provided by TigerGraph Cloud. But, when working with large amounts of data or when trying to create fine-tuned graph searches, using a visual interface is very inefficient. Hence, we can write queries to quickly traverse a graph and extract or insert whatever data we want.

Query Structure

GSQL offers lots of different methods for querying. We will focus on searches. At the core of graph searches is something called a SELECT statement. As the name suggests, the select statement is used to select a set of vertices or edges. The SELECT statement comes with several parameters to narrow down the focus of your search.

The FROM clause specifies what type of edge or vertex you are choosing.

The WHERE clause lets you declare specific conditions for the vertices or edges.

The ACCUM and POST-ACCUM clauses let you handle Accumulators, which are special GSQL variables that gather information as you search (the information can be numbers, sets of vertices or edges, etc.).

The HAVING clause, similar to the WHERE clause, lets you provide additional conditions; however, these will be applied after the previous clauses.

The ORDER BY clause lets you order the gathered edges or vertices by some attribute value.

Finally, the LIMIT clause constrains the number of results of your search.

You can find all of these details, along with other parameters and query methods, on the TigerGraph documentation page.

Writing Graph Queries

Almost any search you could think of for a graph can be handled with the SELECT statement and its corresponding clauses. To prove that fact, let’s practice writing some queries.

All of the following queries can be found on my GitHub page.

These queries are in order from simplest to most complex.

Publications With a Given License

Goal: Find all publications that fall under a given license type.

Code:

CREATE QUERY LicensePub(String l) FOR GRAPH MyGraph {
/* Finds all publications with a given license type   
   Sample Inputs: cc0, cc-by, green-oa, cc-by-nc, no-cc  */
Seed = {LICENSE.*};
Pubs = SELECT p
           FROM Seed:s-(PUB_HAS_LICENSE:e)-PUBLICATION:p
           WHERE s.id == l;
PRINT Pubs[Pubs.id] AS Publications;
}

Explanation: Let’s break down what our code is doing. We want to select all Publication vertices that connect to a specific License vertex. So, we traverse from all LICENSE vertices to all PUBLICATION vertices with the condition that the license id is whatever we specify (i.e. cc0, no-cc, etc.). Then, we just print our results. There are two things to notice in our print statement.

1. If we simply write PRINT Pubs , our output will print the Publications with all of their associated data (title, abstract, etc). So, to filter the output data, we can specify which attributes we want using brackets. In our example, we only print out the ids by writing PRINT Pubs[Pubs.id] .

2. The use of the AS statement is purely cosmetic, and it just changes the name of the resulting list that is printed. This is useful for when you are extracting data to be used in other contexts, but is not necessary for writing queries.

Now, let’s save and install our code. When we run it, we get an input box that looks like this:

Interface after running license query

As an example, I entered ‘cc0’ as the license code. When I click run query, I get an image that looks like this:

Resulting publication vertices after running license query

This shows each publication vertex that has the license we specified. But, this view is quite messy. We can instead view the JSON output by clicking on the <…> icon on the left side. The JSON output should look like this.

JSON output for license query

This looks much cleaner! We can also see the effects of our print statement adjustments. The name of the resulting list is “Publications”, and the only vertex attribute printed is the id.

For the following queries, I will only show the JSON output.

Author With Most Publications

Goal: Find the author who has written the most publications.

Code:

CREATE QUERY AuthorMostPubs() FOR GRAPH MyGraph {
/* This query finds the author with the most publications   
Change the limit to see top 'x' authors */
SumAccum<INT> @pubNum;
Seed={AUTHOR.*};
Author = SELECT a
             FROM Seed:a-()-:t
             ACCUM [email protected] +=1
             ORDER BY [email protected] DESC
             LIMIT 1;  
 
PRINT Author;
}

Explanation: We start by selecting AUTHOR vertices. Notice that the SELECT statement looks different here. That’s because we didn’t specify the edge or target vertex. Since we know the AUTHOR vertex connects to only the PUBLICATION vertex, we can use this “lazy” syntax to save ourselves from having to specify the edge name and target vertex.

We also get our first look at accumulators (reference the documentation here). In this case, we use a local accumulator called pubNum. A local accumulator acts as a unique variable for each vertex, and a SumAccum is a type of accumulator that stores cumulative sums.

So how does this accumulator work? Well, as we traverse from each AUTHOR vertex to its connected PUBLICATION vertices, we add to our accumulator. So, during the ACCUM clause, the accumulator stores the number of connections (which is also the number of publications) as a variable in each AUTHOR vertex.

The next step uses the ORDER BY clause. We use this to place the resulting authors in descending order of their accumulator value. Thus, the author with the most publications will be at the top of the list

Finally, we use the LIMIT clause to limit the output to 1 author (the first author in the list).

When we run the function, our output looks something like this:

Output for Author query

Notice the author id is “nan”. This is the id used when a publication has no author. So, we can see that 2437 articles have no author listed. This is good information, but not exactly what we’re after. To view more results, change the limit. As an example, I’ll change the limit to 5.

JSON output for author search, limit 5

Now we can see the most-published author has 173 publications (wow that’s a lot!)

Journal With Most Publications

We can run a similar query to the author search but instead search for the journal with the most publications.

Goal: Find the journal(s) with the most publications

Code:

CREATE QUERY JournalMostPubs() FOR GRAPH MyGraph {
/* This query finds the journal with the most publications     Change the limit to find the top 'x' journals  */
SumAccum<INT> @pubNum;
Seed = {PUBLICATION.*};
Pubs = SELECT t
           FROM Seed:s-(PUB_HAS_JOURNAL) -:t          
           
           ACCUM [email protected] +=1
           ORDER BY [email protected] DESC
           LIMIT 1;
PRINT Pubs[Pubs.id, [email protected]] AS Most_Published_Journal;
}

Explanation: The code is essentially the same as before, but we instead search for journals instead of authors. We also use the print filters described earlier to make our output nicer.

JSON output for journal query

We see that “Arch Virol”, with 205 publications, is the top journal from our search.

Publications with Most References to a Class Type

Goal: Given a class type, find the publications with the most medical terms of that class.

Example class types: DNA, DISEASE, CANCER, ORGANISM, TAXON.

The full list can be found here.

Code:

CREATE QUERY ClassPub(String c, Int k) FOR GRAPH MyGraph{
/* This query finds the top articles related to a class     
   Sample Input: CANCER, 5 */
SumAccum<INT> @entNum;
Seed = {CLASS.*};
Ents = SELECT e        
 
          FROM Seed:s-(ENTITY_HAS_CLASS)-ENTITY:e       
         
          WHERE s.id == c;
Pubs = SELECT p
           FROM Ents:e -(PUB_HAS_ENTITY)-PUBLICATION:p       
       
           ACCUM [email protected] += 1
           ORDER BY [email protected] DESC
           LIMIT k;
PRINT Pubs[Pubs.id, [email protected]];
 }
Explanation: This is our first look at a 2-hop search. The first hop selects all of the entities, or keywords, connected to a given class type, and the second hop finds all the publications with those entities and ranks them. As an example, let’s give our query the class CANCER and select the top 5. Our output looks something like this.
JSON output for class search

We can now see which publications have the most references to cancer as well as the number of references for each publication.

Similar Publications

Goal: Given a paper, find similar papers based on their shared keywords.

We’ll use Jaccard similarity to determine how closely related 2 papers are. This algorithm essentially computes the number of keywords in common between two papers over the total number of keywords. You can read more about the algorithm here. You can see an example of this formula in action, as well as many other cool graph formulas, on the TigerGraph GitHub.

CREATE QUERY SimilarEnt(STRING doi, INT top) FOR GRAPH MyGraph {
/* Use Jaccard Similarity to find top similar articles of a given article based on the key medical terms used   
Sample Input: 10.1186/1471-2164-7-117, 5 */
SumAccum<INT> @intersection_size, @@set_sizeA, @set_sizeB;
SumAccum<FLOAT> @similarity;
VERTEX check;
Seed = {PUBLICATION.*};
Start = SELECT p
            FROM Seed:p
            WHERE p.id == doi
            ACCUM check = p,    
                         @@set_sizeA+=p.outdegree("PUB_HAS_ENTITY");
Subjects = SELECT t
            FROM Start:s-(PUB_HAS_ENTITY)-:t;
Others = SELECT t
            FROM Subjects:s -(PUB_HAS_ENTITY) - :t
            WHERE t!= check
            ACCUM [email protected]_size +=1,                  
                  [email protected]_sizeB = t.outdegree("PUB_HAS_ENTITY")                            
 
            POST-ACCUM [email protected] = [email protected]_size  
                       *1.0/(@@[email protected]_sizeB- 
                       [email protected]_size)
            ORDER BY [email protected] DESC
            LIMIT top;
PRINT Start[Start.id] AS SOURCE_PUBLICATION;  
PRINT Others[[email protected]] AS SIMILAR_PUBLICATIONS;

Explanation: We start by creating 4 accumulators. Each one represents a value used in the Jaccard formula. For our first SELECT statement, we select the publication matching the input doi and collect all of the edges connected to that vertex into an accumulator. For our second statement, we select all of the entities in that publication. For our third statement, we find all of the publications with any number of the entities we just gathered and find the intersection size (the number of entities in common with the original paper). Finally, we calculate our Jaccard index and sort publications with the highest similarities to be at the top of our output list.

Let’s see an example. I used doi=10.1186/s40413–016–0096–1 and top=5.

JSON output for similarity query

We can see our starting publication as well as the top 5 similar publications, each with their respective similarity score.

Conclusion

If you followed along with this article, I applaud you! This material is not easy, and learning a new language like GSQL can be tricky. I hope you found this walkthrough of GSQL queries insightful. I highly recommend reading my other papers to get a better background for everything we’ve covered today. If you’re looking to check out more query algorithms and structures, check out the TigerGraph documentation for GSQL. If you’re looking for more content, stay tuned! I’ll be releasing Part 4 shortly. In this final part of the series, I’ll cover how to use our graph database and queries to output information that we can visually represent using the UI platform Dash from Plotly. With that, if you have followed along for all 4 parts, you will finish a full, end-to-end application!

If you enjoyed this paper, be sure to check out my other papers and follow me for more content like this!

Resources

  1. https://towardsdatascience.com/using-scispacy-for-named-entity-recognition-785389e7918d
  2. https://towardsdatascience.com/linking-documents-in-a-semantic-graph-732ab511a01e
  3. https://gofile.io/d/fyijVS
  4. https://www.youtube.com/watch?v=JARd9ULRP_I
  5. https://docs.tigergraph.com/dev/gsql-ref/querying/accumulators
  6. https://docs.tigergraph.com/dev/gsql-ref/querying/select-statement#select-statement-data-flow
  7. https://github.com/akash-kaul/GSQL-Query-Searches.git
  8. https://allenai.github.io/scispacy/
  9. https://github.com/tigergraph/gsql-graph-algorithms/blob/master/algorithms/schema-free/jaccard_nbor_ap_json.gsql
  10. https://docs.tigergraph.com/
  11. https://plotly.com/dash/

You Might Also Like

TigerGraph Showcases Unrivaled Performance at Scale

TigerGraph Showcases Unrivaled Performance at Scale

January 12, 2023
How to Create a Visual Graph Analytics Application Using TigerGraph Insights in 30 mins

How to Create a Visual Graph...

November 14, 2022
Turbocharge your business intelligence with TigerGraph’s ML Workbench on TigerGraph Cloud

Turbocharge your business intelligence with TigerGraph’s...

November 14, 2022

Introducing TigerGraph 3.0

July 1, 2020

Everything to Know to Pass your TigerGraph Certification Test

June 24, 2020

Neo4j 4.0 Fabric – A Look Behind the Curtain

February 7, 2020

TigerGraph Blog

  • Categories
    • blogs
      • About TigerGraph
      • Benchmark
      • Business
      • Community
      • Compliance
      • Customer
      • Customer 360
      • Cybersecurity
      • Developers
      • Digital Twin
      • eCommerce
      • Emerging Use Cases
      • Entity Resolution
      • Finance
      • Fraud / Anti-Money Laundering
      • GQL
      • Graph Database Market
      • Graph Databases
      • GSQL
      • Healthcare
      • Machine Learning / AI
      • Podcast
      • Supply Chain
      • TigerGraph
      • TigerGraph Cloud
    • Graph AI On Demand
      • Analysts and Research
      • Customer 360 and Entity Resolution
      • Customer Spotlight
      • Development
      • Finance, Banking, Insurance
      • Keynote
      • Session
    • Video
  • Recent Posts

    • It’s Time to Harness the Power of Graph Technology [Infographic]
    • TigerGraph Showcases Unrivaled Performance at Scale
    • TigerGraph 101 An Introduction to Graph | Jan 26th @ 9am PST
    • Data Science Salon New York
    • Tech For Retail
    TigerGraph

    Product

    SOLUTIONS

    customers

    RESOURCES

    start for free

    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    GRAPH DATA SCIENCE
    • Graph Data Science Library
    • Machine Learning Workbench
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    • Login
    • FAQ
    • Pricing
    • Cloud Marketplaces
    USEr TOOLS
    • GraphStudio
    • TigerGraph Insights
    • Application Workbenches
    • Connectors and Drivers
    • Starter Kits
    • openCypher Support
    SOLUTIONS
    • Why Graph?
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer 360/MDM
    • Recommendation Engine
    • Anti-Money Laundering
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network & IT Management
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    success stories
    • Customer Success Stories

    Partners

    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    LIBRARY
    • Resources
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Million Dollar Challenge
    EDUCATION
    • Training & Certifications
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem

    COMPANY

    Company
    • Overview
    • Careers
    • News
    • Press Release
    • Awards
    • Legal
    • Patents
    • Security and Compliance
    • Contact
    Get Started
    • Start Free
    • Compare Editions
    • Online Demo - Test Drive
    • Request a Demo

    Product

    • Overview
    • TigerGraph 3.0
    • TIGERGRAPH DB
    • TIGERGRAPH CLOUD
    • GRAPHSTUDIO
    • TRY NOW

    customers

    • success stories

    RESOURCES

    • LIBRARY
    • Events
    • EDUCATION
    • BLOG
    • DEVELOPERS

    SOLUTIONS

    • SOLUTIONS
    • use cases
    • industry

    Partners

    • partner program

    company

    • Overview
    • news
    • Press Release
    • Awards

    start for free

    • Request Demo
    • take a test drive
    • SUPPORT
    • COMMUNITY
    • CONTACT
    • Copyright © 2023 TigerGraph
    • Privacy Policy
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph | Privacy Policy

    Copyright © 2020 TigerGraph Privacy Policy

    • SUPPORT
    • COMMUNITY
    • COMPANY
    • CONTACT
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph

    Privacy Policy

    • Products
    • Solutions
    • Customers
    • Partners
    • Resources
    • Company
    • START FREE
    START FOR FREE
    START FOR FREE
    TigerGraph
    PRODUCT
    PRODUCT
    • Overview
    • GraphStudio UI
    • Graph Data Science Library
    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    TRY TIGERGRAPH
    • Get Started for Free
    • Compare Editions
    SOLUTIONS
    SOLUTIONS
    • Why Graph?
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer Journey/360
    • Recommendation Engine
    • Anti-Money Laundering (AML)
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network Resources Optimization
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    CUSTOMERS
    read all success stories

     

    PARTNERS
    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    RESOURCES
    LIBRARY
    • Resource Library
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Graph for All - Million Dollar Challenge
    EDUCATION
    • TigerGraph Academy
    • Certification
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem
    COMPANY
    COMPANY
    • Overview
    • Leadership
    • Careers  
    NEWS
    PRESS RELEASE
    AWARDS
    START FREE
    Start Free
    • Request a Demo
    • SUPPORT
    • COMMUNITY
    • CONTACT
    Dr. Jay Yu

    Dr. Jay Yu | VP of Product and Innovation

    Dr. Jay Yu is the VP of Product and Innovation at TigerGraph, responsible for driving product strategy and roadmap, as well as fostering innovation in graph database engine and graph solutions. He is a proven hands-on full-stack innovator, strategic thinker, leader, and evangelist for new technology and product, with 25+ years of industry experience ranging from highly scalable distributed database engine company (Teradata), B2B e-commerce services startup, to consumer-facing financial applications company (Intuit). He received his PhD from the University of Wisconsin - Madison, where he specialized in large scale parallel database systems

    Todd Blaschka | COO

    Todd Blaschka is a veteran in the enterprise software industry. He is passionate about creating entirely new segments in data, analytics and AI, with the distinction of establishing graph analytics as a Gartner Top 10 Data & Analytics trend two years in a row. By fervently focusing on critical industry and customer challenges, the companies under Todd's leadership have delivered significant quantifiable results to the largest brands in the world through channel and solution sales approach. Prior to TigerGraph, Todd led go to market and customer experience functions at Clustrix (acquired by MariaDB), Dataguise and IBM.