Enhanced Pattern Matching Query Syntax

Enhanced Pattern Matching Query Syntax

In our upcoming 2.4 release, TigerGraph will offer a major syntax extension in GSQL – pattern matching.

Pattern matching enables users to focus on specifying what multi-hop path pattern they want in a query block, without worrying about the direction of the underlying data flow. They can think about the pattern as one query block (SELECT-FROM-WHERE) instead of several individual blocks. The pattern can be fixed length or variable length (using Kleene stars for repeated hops), which significantly shortens the GSQL query length.

In this blog, we give a preview of this pattern matching syntax. And we will hand this feature to you in the upcoming 2.4 release. All examples focus on the FROM clause, where the path pattern is specified and the aliases are bound. Other clauses such as WHERE clause, ACCUM clause etc. are omitted for brevity.

1-Hop Atomic Pattern

Let’s start with the basics. Below we first show a 1-hop fix length pattern, and a 1-hop variable length pattern (star pattern). The description of each pattern is under each FROM clause. For a given edge, if it is undirected, we do not put any decoration around it. For a directed edge, we use “>” or “<” to indicate its direction, such that user can know the source and target vertex types based on the edge definition.

    • 1-hop pattern
      1. FROM X:x – (E1:e1) – Y:y
        • Undirected edge
      2. FROM X:x – (E2>:e2) – Y:y
        • Right directed edge, x binds to the source of E2, y binds to the target of E2.
      3. FROM X:x – (<E3:e3) – Y:y
        • Left directed edge, y binds to the source of E3, x binds to the target of E3.
      4. FROM X:x – (_:e) – Y:y
        • Any undirected edge
      5. FROM X:x – (_>:e) – Y:y
        • Any right directed edge
      6. FROM X:x – (<_:e) – Y:y
        • Any left directed edge
      7. FROM X:x – ((<_|_):e) – Y:y
        • Any left directed or any undirected
      8. FROM X:x – ((E1|E2>|<E3):e) – Y:y
        • Disjunctive 1-hop edge. (x,y) are connected by E1 or E2 or E3.
      9. FROM X:x – () – Y:y
        • any edge (directed or undirected) match this 1-hop pattern
        • Same as this: (<_|_>|_)
    • 1-hop star pattern — repetition of an edge pattern, 0 or more times
      1. FROM X:x – (E1*) – Y:y
      2. FROM X:x – (E2>*) – Y:y
      3. FROM X:x – (<E3*) – Y:y
      4. FROM X:x – (_*) – Y:y
      5. FROM X:x – ((E1|E2>|<E3)*) – Y:y
    • 1-hop star pattern with bounds
      1. FROM X:x – (E1*2..) – Y:y
        • Lower bounds only. There is a chain of at least 2 E1 edges.
      2. FROM X:x – (E2>*..3) – Y:y
        • Upper bounds only. There is a chain of between 0 and 3 E2 edges.
      3. FROM X:x – (<E3*3..5) – Y:y
        • Both Lower and Upper bounds. There is a chain of 3 to 5 E3 edges.
      4. FROM X:x – ((E1|E2>|<E3)*3) – Y:y
        • Exact bound. There is a chain of exactly 3 edges, where each edge is either E1, E2>, or <E3.

 

Multiple Hop Pattern—Succinct Representation

With the above single hop pattern, we can extend it to a multiple hop pattern. Below are some examples to go to 2 and 3 hops.

  • 2-hop pattern
    • FROM X:x-(E1:e1)-Y:y-(E2>:e2)-Z:z
      • If we don’t need to refer to y in any expression, we can use the following syntax sugar to concatenate E1 and E2.
        • FROM X:x-(E1:e1.E2>:e2)-Z:z
      • If we don’t need to refer to e1 or e2 either, we can further simplify the path expression:
        • FROM X:x-(E1.E2>)-Z:z
  • 3-hop pattern
    • FROM X:x-(E2>:e2)-Y:y-(<E3:e3)-Z:z-(E4:e4)-U:u
      • If we only need to reference the end vertices and not any of the intermediate vertices or edges, we can shorten the expression:
        • FROM X:x-(E2>.<E3.E4)-Y:y

 

An Example Based on the LDBC benchmark

The LDBC Social Network Benchmark employs a social forum schema, and has three query workloads to ask different types of queries on this social data.

For example, IC_6 ask the following question:

Given a start Person and some Tag, find the other Tags that occur together with this Tag on Posts that were created by start Person’s friends and friends of friends (excluding start Person). Return top 10 Tags, and the count of Posts that were created by these Persons, which contain both this Tag and the given Tag.

Below is the GSQL implementation using the pattern match syntax.

CREATE QUERY ic_6(VERTEX<Person> personId, STRING tagName) FOR GRAPH ldbc_snb {
  TYPEDEF tuple<STRING tagName, INT postCount> tagStats;

  SetAccum<VERTEX<Post>> @@postAll;
  SumAccum<INT> @postCount;
  HeapAccum<tagStats>(10, postCount DESC, tagName ASC) @@tagStatsTop;

  vPerson = { personId };
  aggPersonPostTag =
    SELECT t3
    FROM vPerson:s
        -((Person_KNOWS_Person>|<Person_KNOWS_Person)*1..2)-Person:t1
        -(<Post_HAS_CREATOR_Person:e2)-Post:t2
        -(Post_HAS_TAG_Tag>:e3)-Tag:t3
    WHERE s != t1
    ACCUM CASE WHEN t3.name == tagName THEN @@postAll += t2 END;

  vPost = { @@postAll };
  vTag =
    SELECT t
    FROM vPost:s-(Post_HAS_TAG_Tag>:e)-Tag:t
    ACCUM CASE WHEN t.name != tagName THEN t.@postCount += 1 END
    POST-ACCUM @@tagStatsTop += tagStats(t.name, t.@postCount);

  PRINT @@tagStatsTop;
}

 

In conclusion, we see significant ease-of-use improvement with the pattern match syntax extension. It more and more guides the user to focus on declarative thinking, instead of imperative thinking. We hope you will enjoy it and please send your valuable feedback to us.