TigerGraph Docs : GSQL 101

TigerGraph Docs : GSQL 101

In this exercise, we will go through the 3-step process of writing GSQL– define a schema, load data, and write a query.


This tutorial is written so that you can follow along and perform the steps on your TigerGraph system as your read.

Get Set

In this tutorial, we will show you how to create a graph schema, load data in your graph, write simple parameterized queries, and run your queries. Before you start, you need to have installed the TigerGraph system, verified that it is working, and cleared out any previous data. It’ll also help to become familiar with our graph terminology.

What is a Graph?

A graph is a collection of data entities and the connections between them.  That is, it’s a network of data entities.

Many people call a data entity a

node

; at TigerGraph we called it a

vertex

. The plural is

vertices

. We call a connection an edge.  Both vertices and edges can have properties or attributes.  The figure below is a visual representation of a graph containing 7 vertices (shown as circles) and 7 edges (the lines).




Figure 1

. Friendship Social Graph

A graph schema is the model which describes the

types

of vertices (nodes) and edge (connections) which can appear in your graph.  The graph above has one type of vertex (person) and one type of edge (friendship).

A schema diagram looks like a small graph, except each node represents one

type

of vertex, and each link represents one

type

of edge.




Figure 2

. Friendship Social Graph Schema

The friendship loop shows that a friendship is between a person and another person.

Data Set

For this tutorial, we will create and query the simple friendship social graph shown in Figure 1. The data for this graph consists of two files in csv (comma-separated values) format. To follow along with this tutorial, please save these two files, person.csv and friendship.csv, to your TigerGraph local disk. In our running example, we use the

/home/tigergraph/ folder to store the two csv files.


person.csv

name,gender,age,state
Tom,male,40,ca
Dan,male,34,ny
Jenny,female,25,tx
Kevin,male,28,az
Amily,female,22,ca
Nancy,female,20,ky
Jack,male,26,fl


friendship.csv

person1,person2,date
Tom,Dan,2017-06-03
Tom,Jenny,2015-01-01
Dan,Jenny,2016-08-03
Jenny,Amily,2015-06-08
Dan,Nancy,2016-01-03
Nancy,Jack,2017-03-02
Dan,Kevin,2015-12-30



Prepare Your TigerGraph Environment

First, let’s check that you can access GSQL.

  1. Open a Linux shell.
  2. Type gsql as below. A GSQL shell prompt should appear as below.


    Linux shell
    $ gsql
    GSQL >

  3. If the GSQL shell does not launch, try reseting the system with “gadmin restart all”. If you need further help, please see the

    TigerGraph Knowledge Base and FAQs

    .

If this is your first time using GSQL, the TigerGraph data store is probably empty.  However, if you or someone else has already been working on the system, there may already be a database.  You can check by listing out the database catalog with the “ls” command. This is what should look like if it is empty:


GSQL shell – an empty database catalog
GSQL > ls
—- Global vertices, edges, and all graphs
Vertex Types:
Edge Types:

Graphs:
Jobs:

Json API version: v2

If the data catalog is not empty, you will need to empty it to start this tutorial. We’ll assume you have your coworkers’ permission.  Use the command DROP ALL to delete all the database data, its schema, and all related definitions. This command takes about a minute to run.


GSQL shell – DROP ALL
GSQL > drop all
Dropping all, about 1 minute …
Abort all active loading jobs
[ABORT_SUCCESS] No active Loading Job to abort.

Shutdown restpp gse gpe …
Graph store /usr/local/tigergraph/gstore/0/ has been cleared!
Everything is dropped.

Restarting TigerGraph


If you need to restart TigerGraph for any reason, use the following command sequence:


Linux shell – Restarting TigerGraph services
# Switch to the user account set up during installation
# The default is user=tigergraph, password=tigergraph
$ su tigergraph
Password:tigergraph

# Start all services
$ gadmin restart -fy

Tip: Running GSQL commands from Linux


You can also run GSQL commands from a Linux shell. To run a single command, just use “gsql” followed by the command line enclosed in single quotes. (The quotes aren’t necessary if there is no parsing ambiguity; it’s safer to just use them.)  For example,


Linux shell – GSQL commands from a Linux shell
# “-g graphname” is need for a given graph
gsql -g social ‘ls’
gsql ‘drop all’
gsql ‘ls’

You can also execute a series of commands which you have stored in a file, by simply invoking “gsql” following by the name of the file.

When you are done, you can exit the GSQL shell with the command “quit” (without the quotes).

Define a Schema



For this tutorial, we will work mostly in the GSQL shell, in interactive mode. A few commands will be from a Linux shell. The first step in creating a GSQL graph is to define its schema. GSQL provides a set of DDL (Data Definition Language) commands, similar to SQL DLL commands, to model vertex types, edge types and a graph.

Create a Vertex Type

Use CREATE VERTEX command to define a vertex type named

person.

Here, PRIMARY_ID is required: each person must have a unique identifier. The rest is the optional list of attributes which characterize each person vertex, in the format

attribute_name  data_type, attribute_name  data_type, ..

.


GSQL command
CREATE VERTEX person (PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING)



We show GSQL keywords in ALL CAPS to highlight them, but they are case-insensitive.

GSQL will confirm the creation of the vertex type.


GSQL shell
GSQL > CREATE VERTEX person (PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING)
The vertex type person is created.
GSQL >

You can create as many vertex types as you need.

Create an Edge Type

Next, use the CREATE … EDGE command to create an edge type named

friendship

. The keyword UNDIRECTED indicates this edge is a bidirectional edge, meaning that information can flow starting from either vertex. If you’d rather have a unidirectional connection where information flows only from the FROM vertex, use the DIRECTED keyword in place of UNDIRECTED.  Here, FROM and TO are required to specify which two vertex types the edge type connects. An individual edge is specifying by giving the primary_ids of its source (FROM) vertex and target (TO) vertex. These are followed by an optional list of attributes, just as in the vertex definition.


GSQL command
CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day DATETIME)

GSQL will confirm the creation of the edge type.


GSQL shell
GSQL > CREATE UNDIRECTED EDGE friendship (FROM person, TO person, connect_day DATETIME)
The edge type friendship is created.
GSQL >

You can create as many edge types as you need.

Create a Graph

Next, use the CREATE GRAPH command to create a graph named

social.

Here, we just list the vertex types and edge types that we want to include in this graph.


GSQL command
CREATE GRAPH social (person, friendship)

GSQL will confirm the creation of the first graph after several seconds, during which it pushes the catalog information to all services, such as the GSE, GPE and RESTPP.


GSQL shell
GSQL > CREATE GRAPH social (person, friendship)

Restarting gse gpe restpp …

Finish restarting services in 16.554 seconds!
The graph social is created.

At this point,
we have created
a

person

vertex type, a

friendship

edge type, and a

social

graph that includes them. You’ve now built your first graph schema! Let’s take a look what’s in the catalog by typing the ”

ls

” command in the GSQL shell.


GSQL shell
GSQL > ls
—- Global vertices, edges, and all graphs
Vertex Types:
VERTEX person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS=”OUTDEGREE_BY_EDGETYPE”
Edge Types:
UNDIRECTED EDGE friendship(FROM person, TO person, connect_day DATETIME)

Graphs:
– Graph social(person:v, friendship:e)
Jobs:

Json API version: v2

Load Data

After creating a graph schema, the next step is to load data into it. The task here is to instruct the GSQL loader how to associate (“map”) the fields in a set of data files to the attributes in your vertex types and edge types of  the graph schema we just defined.

You should have the two data files person.csv and friendship.csv on your local disk. It’s not necessary that they are in the same folder with you.

If you need to exit the GSQL shell for any reason, you can do so by typing “quit” without the quotes.  Type gsql to enter again.

Define a Loading
Job

The loading job below assumes that your data files are in the folder /home/tigergraph. If they are elsewhere, then in the loading job script below replace “/home/tigergraph/person.csv” and “/home/tigergraph/friendship.csv” with their corresponding file path respectively. Assuming you’re (back) in the GSQL shell, enter the following set of commands.


GSQL commands to define a loading job
USE GRAPH social
BEGIN
CREATE LOADING JOB load_social FOR GRAPH social {
DEFINE FILENAME file1=”/home/tigergraph/person.csv”;
DEFINE FILENAME file2=”/home/tigergraph/friendship.csv”;

LOAD file1 TO VERTEX person VALUES ($”name”, $”name”, $”age”, $”gender”, $”state”) USING header=”true”, separator=”,”;
LOAD file2 TO EDGE friendship VALUES ($0, $1, $2) USING header=”true”, separator=”,”;
}
END

Let’s walk through the commands:

  • USE GRAPH social :


    Tells GSQL which graph you want to work with.
  • BEGIN … END :


    Indicates multiple-line mode.  The GSQL shell will treat everything between these markers as a single statement.  These is only needed for interactive mode.  If you run GSQL statements that are stored in a command file, the command interpreter will study your whole file, so it doesn’t need the BEGIN and END hints.
  • CREATE LOADING JOB:


    One loading job can describe the mappings from multiple files to multiple graph objects. Each file must be assigned to a filename variable. The field labels can be either by name or by position. By-name labelling requires a header line in the source file. By-position labelling uses integers to indicate source column position 0, 1,… In the example above, the first LOAD statement refers to the source file columns by name, whereas the second LOAD statement refers to the source file columns by position. Note the following details:

    • The column “name” in file1 gets mapped to two fields, both the PRIMARY_ID and the “name” attribute of the person vertex.
    • In file1, gender comes before age.  In the person vertex, gender comes after age. When loading, state your attributes in the order needed by the target object (in this case, the person vertex).
    • Each LOAD statement has a USING clause.  Here it tells GSQL that both files contain a header (whether we choose to use the names or not, GSQL still needs to know whether to consider the first line as data or not). It also says the column separator is comma. GSQL can handle any single-character separator, not just commas.

When you run the CREATE LOADING JOB statement, GSQL checks for syntax errors and checks that you have data files in the locations specified. If it detects no errors, it compiles and saves your job.


GSQL shell
GSQL > USE GRAPH social
Using graph ‘social’
GSQL > BEGIN
GSQL > CREATE LOADING JOB load_social FOR GRAPH social {
GSQL > DEFINE FILENAME file1=”/home/tigergraph/person.csv”;
GSQL > DEFINE FILENAME file2=”/home/tigergraph/friendship.csv”;
GSQL >
GSQL > LOAD file1 TO VERTEX person VALUES ($”name”, $”name”, $”age”, $”gender”, $”state”) USING header=”true”, separator=”,”;
GSQL > LOAD file2 TO EDGE friendship VALUES ($0, $1, $2) USING header=”true”, separator=”,”;
GSQL > }
GSQL > END
The job load_social is created.

Run a Loading Job

You can now run your loading job to load data into your graph:


GSQL command
RUN LOADING JOB load_social

The result is shown below.


GSQL shell
GSQL > run loading job load_social
[Tip: Use “CTRL + C” to stop displaying the loading status update, then use “SHOW LOADING STATUS jobid” to track the loading progress again]
[Tip: Manage loading jobs with “ABORT/RESUME LOADING JOB jobid”]
Starting the following job, i.e.
JobName: load_social, jobid: social_m1.1528095850854
Loading log: ‘/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/social/social_m1.1528095850854.log’

Job “social_m1.1528095850854” loading status
[FINISHED] m1 ( Finished: 2 / Total: 2 )
[LOADED]
+—————————————————————————+
| FILENAME | LOADED LINES | AVG SPEED | DURATION|
|/home/tigergraph/friendship.csv | 8 | 8 l/s | 1.00 s|
| /home/tigergraph/person.csv | 8 | 7 l/s | 1.00 s|
+—————————————————————————+

Notice the location of the loading log file.  The example assumes that you installed TigerGraph in the default location, /home/tigergraph/. In your installation folder is the main product folder, tigergraph.  Within the tigergraph folder are several subfolders, such as logs, document, config, bin, and gstore. If you installed in a different location, say /usr/local/, then you would find the product folder at /usr/local/tigergraph.

Query Using Built-In SELECT Queries

You now have a graph with data!  You can run some simple built-in queries to inspect the data.

Select Vertices

The following GSQL command reports the total number of person vertices. The person.csv data file had 7 lines after the header.


GSQL command
SELECT count() FROM person

Similarly, the following GSQL command reports the total number of friendship edges. The friendship.csv file also had 7 lines after the header.


GSQL command
SELECT count() FROM person-(friendship)->person

The results are illustrated below.


GSQL shell
GSQL > SELECT count() FROM person
[{
“count”: 7,
“v_type”: “person”
}]
GSQL > SELECT count() FROM person-(friendship)->person
[{
“count”: 14,
“e_type”: “friendship”
}]
GSQL >

Edge Count


Why are there 14 edges?  For an undirected edge, GSQL actually creates two edges, one in each direction.


If you want to see the details about a particular set of vertices, you can use “SELECT *” and the WHERE clause to specify a predicate condition.  Here are some statements to try:


GSQL command
SELECT * FROM person WHERE primary_id==”Tom”
SELECT name FROM person WHERE state==”ca”
SELECT name, age FROM person WHERE age > 30

The result is in JSON format as shown below.


GSQL shell
GSQL > SELECT * FROM person WHERE primary_id==”Tom”
[{
“v_id”: “Tom”,
“attributes”: {
“gender”: “male”,
“name”: “Tom”,
“state”: “ca”,
“age”: 40
},
“v_type”: “person”
}]
GSQL > SELECT name FROM person WHERE state==”ca”
[
{
“v_id”: “Amily”,
“attributes”: {“name”: “Amily”},
“v_type”: “person”
},
{
“v_id”: “Tom”,
“attributes”: {“name”: “Tom”},
“v_type”: “person”
}
]
GSQL > SELECT name, age FROM person WHERE age > 30
[
{
“v_id”: “Tom”,
“attributes”: {
“name”: “Tom”,
“age”: 40
},
“v_type”: “person”
},
{
“v_id”: “Dan”,
“attributes”: {
“name”: “Dan”,
“age”: 34
},
“v_type”: “person”
}
]


Select Edges

In similar fashion, we can see details about edges.  To describe an edge, you name the types of vertices and edges in the three parts, with some added punctuation to represent the traversal direction:


GSQL syntax
source_type -(edge_type)-> target_type

Note that the arrow -> is always used, whether it’s an undirected or directed edge. That is because we are describing the direction of the query’s traversal (search) through the graph, not the direction of the edge itself.

We can use the from_id predicate in the WHERE clause to select all friendship edges starting from the vertex identified by the “from_id”.  The keyword ANY to indicate that any edge type or any target vertex type is allowed. The following two queries have the same result


GSQL command
SELECT * FROM person-(friendship)->person WHERE from_id ==”Tom”
SELECT * FROM person-(ANY)->ANY WHERE from_id ==”Tom”

Restrictions on built-in edge select queries


To prevent queries which might return an excessive number of output items, built-in edge queries have the following restrictions:

  1. The source vertex type must be specified.
  2. The from_id condition must be specified.

There is no such restriction for user-defined queries.


The result is shown below.


GSQL
GSQL > SELECT * FROM person-(friendship)->person WHERE from_id ==”Tom”
[
{
“from_type”: “person”,
“to_type”: “person”,
“directed”: false,
“from_id”: “Tom”,
“to_id”: “Dan”,
“attributes”: {“connect_day”: “2017-06-03 00:00:00”},
“e_type”: “friendship”
},
{
“from_type”: “person”,
“to_type”: “person”,
“directed”: false,
“from_id”: “Tom”,
“to_id”: “Jenny”,
“attributes”: {“connect_day”: “2015-01-01 00:00:00”},
“e_type”: “friendship”
}
]

Another way to check the graph’s size is using one of the options of the administrator tool, gadmin. From a Linux shell, enter the command

gadmin status graph -v


Linux shell
[tigergraph@localhost ~]$ gadmin status graph -v
verbose is ON
=== graph ===
[m1 ][GRAPH][MSG ] Graph was loaded (/usr/local/tigergraph/gstore/0/part/): partition size is 4.00KiB, SchemaVersion: 0, VertexCount: 7, NumOfSkippedVertices: 0, NumOfDeletedVertices: 0, EdgeCount: 14
[m1 ][GRAPH][INIT] True
[INFO ][GRAPH][MSG ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information
[SUMMARY][GRAPH] graph is ready


Query Using A Parameterized GSQL Query

We just saw how easy and quick it is to run simple built-in queries. However you’ll undoubtedly want to create more customized or complex queries.  GSQL puts maximum power in your hands through parameterized vertex set queries. Parameterized queries let you traverse the graph from one vertex set to an adjacent set of vertices, again and again, performing computations along the way, with built-in parallel execution and handy aggregation operations. You can even have one query call another query.  But we’ll start simple.

A GSQL parameterized query has three steps.


  1. Define

    your query in GSQL. This query will be added to the GSQL catalog.

  2. Install

    one or more queries in the catalog, generating a REST endpoint for each query.

  3. Run

    an installed query, supplying appropriate parameters, either as a GSQL command or by sending an HTTP request to the REST endpoint.

A Simple 1-Hop Query

Now, let’s write our first GSQL query. We’ll display all the direct (1-hop) neighbors of a person, given as an input parameter.


GSQL command
USE GRAPH social
CREATE QUERY hello(VERTEX<person> p) FOR GRAPH social{
  Start = {p};
  Result = SELECT tgt
           FROM Start:s-(friendship:e) ->person:tgt;
  PRINT Result;
}



This query features one SELECT statement.  The SELECT statements here are much more powerful than the ones in built-in queries. Here you can do the following:The query starts by seeding a vertex set “Start” with the person vertex identified by parameter

p

passed in from the query call. The curly braces tell GSQL to construct a set containing the enclosed items.

Next, the SELECT statement describes a 1-hop traversal according to the pattern described in the FROM clause:

Start:s -(friendship:e)-> person:tgt

This is basically the same syntax we used for the built-in select edges query.  Namely, we select all edges beginning from the given source set (Start), which have the given edge type (friendship) and which end at the given vertex type (person). A feature we haven’t seen before is the use of vertex and edge set aliases defined by “:alias”: “s” is the alias for the source vertex set, “e” is the edge set alias, and “tgt” is the target vertex set alias.

Refer back to the initial clause and the assignment (“Result = SELECT tgt”). Here we see the target set’s  alias tgt.  This means that the SELECT statement should return the target vertex set (as filtered and processed by the full set of clauses in the SELECT query block) and assign that output set to the variable called Result.

Last, we print out the Result vertex set, in JSON format.

Create A Query

Rather than defining our query in interactive mode, we can store the query in a file and invoke the file from within the GSQL shell, using the @filename syntax. Copy and paste the above query into a file /home/tigergraph/hello.gsql. Then, enter the GSQL shell and invoke the file using @hello.qsql (

Note that if you are not in the /home/tigergraph folder when you start gsql, then you can use the absolute path to invoke a gsql file. e.g., @/home/tigergraph/hello.gsql

) Then run the “ls” command to see that the query is now in the catalog.


GSQL shell
GSQL > @hello.gsql
Using graph ‘social’
The query hello has been added!
GSQL > ls
—- Graph social
Vertex Types:
VERTEX person(PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING) WITH STATS=”OUTDEGREE_BY_EDGETYPE”
Edge Types:
UNDIRECTED EDGE friendship(from person, to person, connect_day DATETIME)

Graphs:
– Graph social(person:v, friendship:e)
Jobs:
CREATE LOADING JOB load_social FOR GRAPH social {
DEFINE FILENAME file2 = “/home/tigergraph/friendship.csv”;
DEFINE FILENAME file1 = “/home/tigergraph/person.csv”;

LOAD file1 TO VERTEX person VALUES($”name”, $”name”, $”age”, $”gender”, $”state”) USING SEPARATOR=”,”, HEADER=”true”, EOL=”\n”;
LOAD file2 TO EDGE friendship VALUES($0, $1, $2) USING SEPARATOR=”,”, HEADER=”true”, EOL=”\n”;
}

Queries:
– hello(vertex<person> p)

Install a Query

However, the query is not installed yet; it is not ready to run. In the GSQL shell, type the following command to installed the just added query “hello”.


GSQL command
INSTALL QUERY hello



GSQL shell
GSQL > INSTALL QUERY hello
Start installing queries, about 1 minute …
hello query: curl -X GET ‘http://127.0.0.1:9000/query/social/hello?p=VALUE’. Add -H “Authorization: Bearer TOKEN” if authentication is enabled.

[========================================================================================================] 100% (1/1)

It takes about 1 minute for the database to install this new query. Be patient! For queries on large datasets, this small investment pays off many times over in faster query execution, particularly if you will run the query many times, with different parameters. The installation will generate machine instructions and a REST endpoint. After the progress bar reaches 100%, we are ready to run this query.

Run a Query in GSQL

To run a query in GSQL, use “RUN QUERY” followed by the query name and a set of parameter values.


GSQL command – run query examples
RUN QUERY hello(“Tom”)

The result is presented in JSON format.  Tom has two 1-hop neighbors, namely Dan and Jenny.


GSQL shell
GSQL > RUN QUERY hello(“Tom”)
{
“error”: false,
“message”: “”,
“version”: {
“schema”: 0,
“api”: “v2”
},
“results”: [{“Result”: [
{
“v_id”: “Dan”,
“attributes”: {
“gender”: “male”,
“name”: “Dan”,
“state”: “ny”,
“age”: 34
},
“v_type”: “person”
},
{
“v_id”: “Jenny”,
“attributes”: {
“gender”: “female”,
“name”: “Jenny”,
“state”: “tx”,
“age”: 25
},
“v_type”: “person”
}
]}]
}


Run a Query as a REST Endpoint

Under the hood, installing a query will also generate a REST endpoint, so that the parameterized query can be invoked by an http call. In Linux, the curl command is the most popular way to submit an http request. In the example below, the portion that is standard for all queries is shown in

bold

; the portion in normal weight pertains to this particular query and parameter value. The JSON result will be returned to the Linux shell’s standard output. So, our parameterized query becomes a http service!



Linux shell

curl -X GET 'http://localhost:9000/query/social/hello?p=Tom'

Finally, to see the GSQL text of a query in the catalog, you can use


GSQL command – show query example
#SHOW QUERY query_name. E.g.
SHOW QUERY hello

Congratulations! At this point, you have gone through the whole process of defining, installing, and running a query.

A More Advanced Query

Now, let’s do a more advanced query. This time, we are going to learn to use the powerful built-in accumulators, which serves as the

runtime

attributes (properties) attachable to each vertex visited during our traversal on the graph.  Runtime means they exist only while the query is running; they are called accumulators because they are specially designed to gather (accumulate) data during an implicitly parallel processing of the query.


GSQL command file – hello2.gsql
USE GRAPH social
CREATE QUERY hello2 (VERTEX<person> p) FOR GRAPH social{
OrAccum @visited = false;
AvgAccum @@avgAge;
Start = {p};

FirstNeighbors = SELECT tgt
FROM Start:s -(friendship:e)-> person:tgt
ACCUM tgt.@visited += true, s.@visited += true;

SecondNeighbors = SELECT tgt
FROM FirstNeighbors -(:e)-> :tgt
WHERE tgt.@visited == false
POST_ACCUM @@avgAge += tgt.age;

PRINT SecondNeighbors;
PRINT @@avgAge;
}
INSTALL QUERY hello2
RUN QUERY hello2(“Tom”)

In this query we will find all the persons which are exactly 2 hops away from the parameterized input person. Just for fun, let’s also compute the average age of those 2-hop neighbors.

In the standard approach for this kind of graph traversal algorithm, you use a boolean variable to mark the first time that the algorithm “visits” a vertex, so that it knows not to count it again. To fit this need, we’ll define a local accumulator of the type OrAccum. To declare a local accumulator, we prefix an identifier name with a single “@” symbol. Each accumulator type has a default initial value; the default value for boolean accumulators is false. Optionally, you can specify an initial value.

We also want to compute one average, so we will define a global AvgAccum. The identifier for a global accumulator begins with two “@”s.

After defining the Start set, we then have our first one 1-hop traversal. The SELECT and FROM clauses are the same as in our first example, but there is an additional ACCUM clause. The += operator within an ACCUM clause means that for

each

edge matching the FROM clause pattern, we

accumulate

the right-hand-side expression (true) to the left-hand-accumulator (tgt.@visited as well as s.@visited). Note that a source vertex or target vertex may be visited multiple times. Referring to Figure 1, if we start at vertex Tom, there are two edges incidents to Tom, so the ACCUM clause in the first SELECT statement will visit Tom two times. Since the accumulator type is OrAccum, the cumulative effect of the two traversals is the following:

Tom.@visited <== (initial value: false) OR (true) OR (true)

Note that it does not matter which of the two edges was processed first, so this operation is suitable for multithreaded parallel processing. The net effect is that as long as a vertex is visited at least once, it will end up with @visited = true. The result of this first SELECT statement is assigned to the variable FirstNeighbors.

The second SELECT block will do one hop further, starting from the FirstNeighbors vertex set variable, and reaching the 2-hop neighbors. Note that this time, we have omitted the edge type friendship and the target vertex type person from the FROM clause, but we retained the aliases.  If no type is mentioned for an alias, then it is interpreted as ALL types. Since our graph has only one vertex type and one edge type, it is logically the same as if we had specified the types.  The WHERE clause filters out the vertices which have been marked as visited before (the 1-hop neighbors and the starting vertex

p

). This SELECT statement uses POST_ACCUM instead of ACCUM.  The reason is that POST_ACCUM traverses the vertex sets instead of the edge sets, guaranteeing that we do not double-count any vertices.  Here, we accumulate the ages of the 2-hop neighbors to get their average.

Finally, the SecondNeighbors of p are printed out.

This time, we put all of the following GSQL commands into one file hello2.gsql:

  • USE GRAPH social
  • The query definition
  • Installing the query
  • Running the query

We can execute this full set of commands


without


entering the GSQL shell. Please copy and paste the above GSQL commands into a Linux file named /home/tigergraph/hello2.gsql.

In a Linux shell, under /home/tigergraph, type the following:


Linux shell
gsql hello2.gsql

The result is shown as below.


GSQL Query Summary:

  • Queries are installed in the catalog and can have one or more input parameters, enabling reuse of queries.
  • A GSQL query consists of a series of SELECT query blocks, each generating a named vertex set.
  • Each SELECT query block can start traversing the graph from any of the previously defined vertex sets (that is, the sequence does not have to form a linear chain).
  • Accumulators are runtime variables with built-in accumulation operations, for efficient multithreaded computation.
  • Output is in JSON format.


Review: What You’ve Learned

You have learned a lot in GSQL 101!

With just the knowledge from GSQL 101 and a little practice, you should be able to do the following:

  • Create a graph schema containing multiple vertex types and edge types.
  • Define a loading job that takes one or more CSV files and maps the data directly to the vertices and edges of your graph.
  • Write and run simple parameterized queries which start at one vertex and then traverse one or more hops to generate a final vertex set. Make a simple additive computation and return the results.


Want to learn more?