Writing Output to Cloud Storage
This guide shows you how to configure TigerGraph to write query results to AWS S3 or S3-compatible storage systems, such as Ceph, MinIO, or Wasabi.
To write to an S3 bucket, you need:
-
Access Key ID
-
Secret Access Key
-
Endpoint URL for S3-compatible systems
-
Region, if different from the default
For details on how queries write to S3 using the FILE object, see FILE Object.
Configure the S3 Connection
You can configure S3 access in three ways:
-
Cluster-wide using gadmin
-
Per-session using GSQL parameters
-
Via RESTPP headers
Choose the method based on whether you want settings to apply to all users or just your session.
Use gadmin Config (Cluster-Wide)
Set S3 credentials for the entire TigerGraph cluster.
-
Open your terminal.
-
Run these commands:
gadmin config set GPE.QueryOutputS3AWSAccessKeyID <your-access-key-id> gadmin config set GPE.QueryOutputS3AWSSecretAccessKey <your-secret-access-key> gadmin config apply -y gadmin restart gpe -y -
Replace
<your-access-key-id>and<your-secret-access-key>with your S3 credentials.
|
gadmin supports only Access Key ID and Secret Access Key. It does not support |
Use GSQL Session Parameters
Set credentials and the endpoint URL for your current GSQL session. This method suits users working with their own buckets or testing different S3-compatible systems. Alternatively, you can authenticate using an AWS role by setting s3_assume_role [v4.2.3].
-
Open the GSQL shell.
-
Run these commands:
SET s3_aws_access_key_id = "<your-access-key-id>" SET s3_aws_secret_access_key = "<your-secret-access-key>" SET s3_region = "<region>" SET s3_endpoint = "http://<your-s3-host>"or
SET s3_assume_role= "<RoleARN>" SET s3_region = "<region>" SET s3_endpoint = "http://<your-s3-host>" -
Replace the placeholders:
-
<your-access-key-id>: Your S3 access key. -
<your-secret-access-key>: Your S3 secret key. -
<region>: Your S3 bucket’s AWS region (for example,us-west-2,eu-central-1). -
<RoleARN>: Your IAM role’s ARN (for example,arn:aws:iam::123456789012:role/MyRole). -
<your-s3-host>: Your S3-compatible server URL, includinghttp://orhttps://(for example,http://192.168.99.1:8080for Ceph) [v4.2.1].
-
s3_region allows you to explicitly set the S3 bucket region. If s3_region is not set, the default region is us-east-1.
If your S3 bucket is in a different region, set s3_region to match the bucket region. Otherwise, requests to the bucket may fail [v4.2.3].
|
Use RESTPP Headers (API Calls)
Set S3 parameters in HTTP headers for queries run via RESTPP, such as with curl. This method works without GSQL and is ideal for API integrations or one-off queries.
-
Open your terminal.
-
Run a curl command:
curl -X GET \ -H "GSQL-S3AWSAccessKeyId: <your-access-key-id>" \ -H "GSQL-S3AWSSecretAccessKey: <your-secret-access-key>" \ -H "GSQL-S3Region: <region>" \ -H "GSQL-S3Endpoint: http://<your-s3-host>" \ 'http://<tigergraph-host>:14240/restpp/query/<graph-name>/<query-name>?param=<s3-path>'or
curl -X GET \ -H "GSQL-S3AssumeRole: <RoleARN>" \ -H "GSQL-S3Region: <region>" \ -H "GSQL-S3Endpoint: http://<your-s3-host>" \ 'http://<tigergraph-host>:14240/restpp/query/<graph-name>/<query-name>?param=<s3-path>' -
Replace the placeholders:
-
<your-access-key-id>: Your S3 access key. -
<your-secret-access-key>: Your S3 secret key. -
<region>: Your S3 bucket’s AWS region (for example,us-west-2). -
<RoleARN>: Your IAM role’s ARN (for example,arn:aws:iam::123456789012:role/MyRole). -
<your-s3-host>: Your S3-compatible server URL, includinghttp://orhttps://. -
<tigergraph-host>: Your TigerGraph server address (for example,10.244.106.233). -
<graph-name>: Your graph name (for example,ldbc_snb). -
<query-name>: Your query name (for example,key_test). -
<s3-path>: The S3 path (for example,s3://my-test-bucket/param_hello_distributedgpr.txt).
-
curl -X GET \
-H "GSQL-S3AWSAccessKeyId: S3_Access_Key_For_Ceph" \
-H "GSQL-S3AWSSecretAccessKey: S3_Secret_Key_For_Ceph" \
-H "GSQL-S3Region: us-east-1" \
-H "GSQL-S3Endpoint: http://192.168.99.1:8080" \
'http://10.244.106.233:14240/restpp/query/ldbc_snb/key_test?param_hello_f=s3://my-test-bucket-1749720243/param_hello_distributedgpr.txt'
Run Queries in GSQL Shell
Combine session parameters and query execution in a single GSQL command. This method is convenient for scripting or quick tests.
-
Open your terminal.
-
Run this command:
gsql -g <graph-name> "set query_timeout=7200000 set s3_aws_access_key_id = \"<your-access-key-id>\" set s3_aws_secret_access_key = \"<your-secret-access-key>\" set s3_region = \"<region>\" set s3_endpoint = \"http://<your-s3-host>\" run query <query-name>(\"s3://<bucket-name>/<file-path>\")"or
gsql -g <graph-name> "set query_timeout=7200000 set s3_assume_role=\"<RoleARN>\"; set s3_region = \"<region>\" set s3_endpoint = \"http://<your-s3-host>\" run query <query-name>(\"s3://<bucket-name>/<file-path>\")" -
Replace the placeholders:
-
<graph-name>: Your graph (for example,ldbc_snb). -
<your-access-key-id>: Your S3 access key. -
<your-secret-access-key>: Your S3 secret key. -
<region>: Your S3 bucket’s AWS region (for example,us-west-2). -
<RoleARN>: Your IAM role’s ARN (for example,arn:aws:iam::123456789012:role/MyRole). -
<your-s3-host>: Your S3-compatible server URL (for example,http://192.168.99.1:8080). -
<query-name>: Your query (for example,gsql_test). -
<bucket-name>: Your S3 bucket (for example,my-test-bucket-1749720243). -
<file-path>: The output file path (for example,param_hello_distributedgpr.txt).
-
gsql -g ldbc_snb "set query_timeout=7200000
set s3_aws_access_key_id = \"S3_Access_Key_For_Ceph\"
set s3_aws_secret_access_key = \"S3_Secret_Key_For_Ceph\"
set s3_region = \"us-east-1\"
set s3_endpoint = \"http://192.168.99.1:8080\"
run query gsql_test(\"s3://my-test-bucket-1749720243/param_hello_distributedgpr.txt\")"
Prevent File Path Conflicts
When multiple cluster nodes write to the same S3 bucket, TigerGraph adds unique prefixes to file paths to avoid conflicts, such as overwriting files. This applies to both AWS S3 and S3-compatible systems.
-
Instance Name: A prefix like
GPE_{PartitionId}_{ReplicaId}identifies the node (for example,GPE_0_1for partition 0, replica 1). -
Role: For distributed queries, suffixes indicate:
-
.coordinator: The node managing the query. -
.worker: The node processing the query.
-
-
GPE_0_0.worker.queryResults.csv: Output from the worker node in partition 0, replica 0. -
GPE_0_1.coordinator.queryResults.csv: Output from the coordinator node in partition 0, replica 1.
Example Scenario
Suppose you have a 3x2 cluster (3 partitions, 2 replicas each) running a distributed query that writes to s3://my-test-bucket/queryResults.csv. For a Ceph bucket, you set the endpoint (for example, http://192.168.99.1:8080). The output files might include:
-
GPE_0_0.worker.queryResults.csv -
GPE_0_1.coordinator.queryResults.csv
These unique names ensure no conflicts across nodes.
Troubleshoot Common Issues
-
“Cannot determine target” error: You didn’t set
s3_endpointfor an S3-compatible system. Add it in session parameters or RESTPP headers. -
Connection fails on Ceph, MinIO, or Wasabi: Verify the endpoint URL includes
http://orhttps://. Check your bucket permissions and credentials. -
Region-related errors: The system infers the region from the endpoint, so no separate region setting is needed. If errors persist, confirm the endpoint is correct.
-
Backup vs. query output: Backup features support more S3 options, like endpoint. If queries fail, test your bucket with a backup to validate settings.