Skip to content
START FOR FREE
START FOR FREE
  • SUPPORT
  • COMMUNITY
  • CONTACT
Menu
  • SUPPORT
  • COMMUNITY
  • CONTACT
MENUMENU
  • Products
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      PRODUCTS

      • Why Graph?
      • Products Overview
      • What's New
      • GraphStudio User Interface
      • Visualization Toolkits
      • GSQL Query Language
      • Connectors

      TigerGRAPH ENTERPRISE

      • Enterprise Overview
      • Features

      TIGERGRAPH CLOUD

      • Cloud Overview
      • Cloud Starter Kits
      • Login/Sign Up
      • FAQ
      • Pricing
      • Marketplaces

      Graph Data Science

      • Graph Data Science Library
      • Machine Learning Workbench

      TRY TIGERGRAPH

      • Get Started for Free
      • Compare Editions
      • Test Drive - Online Demo
      • Watch Demos
      • Request a Demo
  • Solutions
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      Solutions

      • Solutions Overview

      INCREASE REVENUE

      • Customer Journey/360
      • Product Marketing
      • Entity Resolution
      • Recommendation Engine

      MANAGE RISK

      • Fraud Detection
      • Anti-Money Laundering
      • Threat Detection
      • Risk Monitoring

      IMPROVE OPERATIONS

      • Supply Chain Analysis
      • Energy Management
      • Network Optimization

      By Industry

      • Advertising, Media & Entertainment
      • Financial Services
      • Healthcare & Life Sciences

      FOUNDATIONAL

      • AI & Machine Learning
      • Time Series Analysis
      • Geospatial Analysis
  • Customers
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      CUSTOMER SUCCESS STORIES

      • Citrix
      • Ford
      • Intuit
      • JPMorgan Chase
      • READ MORE SUCCESS STORIES
      • Jaguar Land Rover
      • United Health Group
      • Xbox
  • Partners
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      PARTNER PROGRAM

      • Partner Benefits
      • TigerGraph Partners
      • Sign Up
      TigerGraph partners with organizations that offer complementary technology solutions and services.​
  • Resources
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      BLOG

      • TigerGraph Blog

      RESOURCES

      • Resource Library
      • Benchmarks
      • Demos
      • O'Reilly Graph + ML Book

      EVENTS & WEBINARS

      • Graph+AI Summit
      • Trade Shows
      • Graph for All - Million Dollar Challenge
      • Webinars

      DEVELOPERS

      • Documentation
      • Ecosystem
      • Developers Hub
      • Community Forum

      SUPPORT

      • Contact Support
      • Production Guidelines

      EDUCATION

      • Certification
      • Training
  • Company
    • Join the World’s Fastest and Most Scalable Graph Platform

      WE ARE HIRING

      COMPANY

      • Company Overview
      • Leadership
      • Legal Terms
      • Patents
      • Security and Compliance

      CAREERS

      • Join Us
      • Open Positions

      AWARDS

      • Awards and Recognition
      • Leader in Forrester Wave
      • Gartner Research

      PRESS RELEASE

      • Read All Press Releases
      TigerGraph Leads Global Organizations into the Future with New Cloud Features that Make Graph Technology More Accessible to All
      June 29, 2022
      Read More »

      NEWS

      • Read All News
      Farnell

      TigerGraph rend la technologie des graphes plus accessible à tous

      AI Authority

      TigerGraph Leads Global Organizations into the Future with New Cloud Features that Make Graph Technology More Accessible to All

  • START FOR FREE
    • The World’s Fastest and Most Scalable Graph Platform

      GET STARTED

      • Request a Demo
      • CONTACT US
      • Try TigerGraph
      • GET STARTED FOR FREE
      • TRY AN ONLINE DEMO

Amazon Neptune, the Truth Revealed

  • Mingxi Wu
  • August 29, 2018
  • Benchmark, blog, Graph Database Market, Graph Databases
  • Blog >
  • Amazon Neptune, the Truth Revealed

In May this year, Amazon announced the General Availability of its cloud graph database service called Amazon Neptune. Here is a comprehensive blog that summarizes its strengths and weaknesses. Last year, we published a benchmark report on Neo4j, Titan and TigerGraph. It is intriguing for us to find out how does Amazon Neptune perform relative to the other three graph databases? To answer this question, we conducted the same benchmark on Amazon Neptune. This blog presents the discoveries revealed by our benchmark.

Executive Summary

Amazon Neptune provides both RDF and Property Graph models. For this benchmark, we focused on the Property Graph model, which uses Gremlin as its query interface.

  • Data Loading Speed: TigerGraph loads 21x to 31.2x faster than Neptune r4.4xlarge instance, and 30x to 57.6x faster than Neptune r4.8large instance.
  • Storage: Amazon Neptune needs 8.0x to 9.6x more disk storage for the same input graph data.
  • 1-hop and 2-hop Graph Queries : TigerGraph is at least 10x faster for 1-hop twitter data set and at least 57x faster for 2-hops for both data sets.
  • 3-hop and 6-hop Queries: For 3-hop queries, Amazon Neptune returns out of memory (OOM) error on 17 out of 20 trials. Neptune could not finish 6-hop queries.
  • Analytic Queries (PageRank, Connected Components, etc.): Not Supported on Neptune, supported in TigerGraph.

*Please note that the loading time for TigerGraph does not include few minutes required for creating the schema – this time is constant and does not vary with the size of data.

We have made our benchmark datasets, queries, input parameters, result files and scripts available via GitHub, so that any reader can reproduce our benchmark results [1].

Setup

Data Set

We use two publicly available data sets. The first one is from graph500.org. It provides a synthetic data generator using Kronecker product. The second one is a well-known Twitter follower data set. For each graph, the raw data are formatted as a single tab-separated edge list:

U1  U3
U1  U4
U2  U3

Table 1. Data sets

Find the data sets here: graph500 , twitter

Hardware & Software

Table 2. Cloud hardware

A. For r4.8xlarge, we attached a 250G EBS-optimized Provisioned IOPS SSD (IO1). It’s the storage we used to install vendor software, host raw data, host graph DB storage to ensure same IOPS during loading and querying. The IOPS we set is 50/Gib for this 250G SSD drive, producing total IOPS of 12,500.

B. Neptune DB instance have 5 classes. Each class has pre-defined configuration settings, such as CPUs, Memory, SSD, etc. Our benchmark is based on the two largest DB instance classes.

C. Amazon Neptune IOPS is observed from the Amazon Cloudwatch console.

Table 3. Amazon Neptune client machine

Tests

First, we compared the data loading performance according to the following measures:

  • Time to load our benchmark data sets. The loading time includes vertex file preparation time for Amazon Neptune as it requires a specific CSV format.
  • Storage size of loaded data.

Table 4. Loading methods

Then we measured the query response times for the following queries:

  • Count all 1-hop-path endpoint vertices for 300 fixed random seeds, timeout set to 3 minutes/query.
  • Count all 2-hop-path endpoint vertices for 300 fixed random seeds, timeout set to 3 minutes/query.
  • Count all 3-hop-path endpoint vertices for 10 fixed random seeds, timeout set to 3 minutes/query.
  • Count all 6-hop-path endpoint vertices for 10 fixed random seeds, timeout set to 3 minutes/query.
  • We attempted to test PageRank and Weakly Connected Component discovery queries, but Amazon Neptune seems to lack the language support for algorithmic queries.

By “fixed random seed”, we mean we did a one-time random selection of N vertices from the graph and saved this list as the repeatable input condition for the tests. Each path query starts from one vertex and finds all the vertices which are at the end of a k-length path. We ran the N queries sequentially. We only count how many path endpoint vertices are discovered rather than returning the set of vertices, so that I/O time is not a factor. However, we also ran cross validation tests to confirm that Amazon Neptune and TigerGraph are discovering the same set of path endpoint vertices.

Results

The figures below report the benchmarking results for loading time, DB storage size, and k-hop-path-neighbor-count query time.

Figure 1. Loading time

Amazon Neptune loading time includes vertex file loading time, edge file loading time, and the vertex file preparation time. TigerGraph can load the edge file directly without the need to create the vertex file. TigerGraph can load the edge file directly without needing a separate vertex file. Vertices as well as edges are created from the edge file in TigerGraph, and the time for creating both is included in the loading time for TigerGraph. 

Figure 2. Disk storage size

For Amazon Neptune, total storage size observed was divided by 6 to account for the replication factor. This ensured an apples to apples comparison of single instance of Amazon Neptune with single instance of TigerGraph for graph storage. 

Figure 3. 1-hop-neighbor-count average query time over 300 fixed random seeds

Figure 4. 2-hop-neighbor-count average query time over 300 fixed random seeds

For the graph500 data, Neptune-1, one of the 300 queries timed out. For Twitter data, Neptune-1 had 63 query timeouts, and Neptune-2 had 60 query timeouts.

Figure 5. 3-hop-neighbor-count average query time over 10 fixed random seeds

For graph500 data, Neptune-1 and Neptune-2 both had 8 of the 10 queries fail due to Out-Of-Memory (OOM) error. For Twitter data set, both Neptune-1 and Neptune-2 had 9 out of 10 queries fail due to OOM.

Table 6. Loading Time for graph500

Table 7. Loading Time for twitter

Neptune requires that a separate vertex file be loaded before loading edges. We counted the time required to extract vertex data from the edge file and to load the vertex file.

Table 8. Average time of K-hop-path-neighbor count query for graph500

Table 9. Average time of K-hop-path-neighbor count query for twitter.

We do not include timeout/OOM queries in Average Time of K-hop-path-neighbor calculation.

Discussion

In this section, we discuss some of the differences between the products which are relevant to these tests.

Amazon Neptune

Neptune provides a bulk loader, which can ingest data from Amazon S3 storage only, so users must first put their data in S3. The maximum storage limit of Neptune is 64TB [2]. For property graphs, data files must be in CSV format. Also, the graph’s property types can only be simple scalar types [3], while TigerGraph property types can be scalar plus complex types such as map, list and set. Neptune also loads vertices and edges separately. We had to prepare for loading by extracting the vertex IDs from the edge file and then writing them to a properly formatted file. We recorded the peak IOPS of the two instances benchmarked. From the Amazon Cloudwatch console, we observed that their IOPS performance is much better than that of the TigerGraph EC2 machine’s SSD. Despite better hardware, the Neptune loading is still 21x – 57.6x slower than TigerGraph’s.

For the 1-hop query, both instances of Neptune can finish each of the 300 random input cases within the 3-minute timeout setting. However, for 2-hop queries, on the graph500 data, Neptune-1 had one query timeout. On the larger Twitter dataset, Neptune-1 had 63 query timeouts, and Neptune-2 had 60 query timeouts. Worse, for 3-hop queries, on the graph500 data, Neptune-1 and Neptune-2 both had 8 queries out of 10 run out of memory, meaning they could not finish the test at all. On the Twitter data set, both Neptune-1 and Neptune-2 had 9 queries run out of memory. Lastly, Amazon Neptune cannot answer any of the 6-hop queries tested. We reiterate that Neptune instances come in fixed configurations, and we tested the largest ones available.

Amazon Neptune does not support VertexProgram from the Gremlin OLAP API [4]. Therefore, we could not run analytic queries such as Weakly Connected Component or PageRank.

Neptune supports server node replication by creating up to 15 read replicas [2]; however, it does not support a distributed database where the data are partitioned across a cluster. In this experiment, we used only a primary node (without read replicas).

A Neptune instance comes with one built-in graph.  It does not allow users to create multiple graphs.

TigerGraph

TigerGraph has a high-level loading language with built-in ETL, so it can accept many different input formats and perform data transformations. One feature is that users can load an edge file without any vertex files if there are no properties on the vertex. This situation applies to the two data sets here.

Not only does TigerGraph load data at least 20x faster than Amazon, but it stores it in less space. This means TigerGraph can store a larger database in the same amount of memory. It also suggests faster data access, because CPU caches will be able to hold a larger portion of the graph.

All of the k-hop queries finished, without memory or timeout problems. Moreover, TigerGraph’s GSQL language can implement OLAP type queries, such as Weakly Connected component and PageRank. Since they are written in standard GSQL, the user is free to customize these queries.

If you are interested in learning more about TigerGraph here are some helpful resources:

  • Download the TigerGraph Developer Edition: https://www.tigergraph.com/developer/
  • Download the GSQL White Paper: https://info.tigergraph.com/gsql
  • Download the Neptune Benchmark: https://info.tigergraph.com/tigergraph_vs_neptune

 

Reference

[1] Script for reproduce Neptune benchmark and TigerGraph benchmark   https://github.com/tigergraph/ecosys/tree/benchmark/benchmark/neptune
https://github.com/tigergraph/ecosys/tree/benchmark/benchmark/tigergraph
[2] Neptune read replica and storage limits https://aws.amazon.com/neptune/faqs/
[3] Neptune CSV load data types https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html
[4] Neptune does not support VertexProgram https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-differences.html

You Might Also Like

Five Indications a Graph Solution Is the Right Fit for Your Data Problem

Five Indications a Graph Solution Is...

June 30, 2022
Podcast: TigerGraph Machine Learning Workbench with Victor Lee

Podcast: TigerGraph Machine Learning Workbench with...

June 28, 2022
Turn Customer Data into Valuable Insights and Improved Operations with Customer 360

Turn Customer Data into Valuable Insights...

June 23, 2022

Introducing TigerGraph 3.0

July 1, 2020

Everything to Know to Pass your TigerGraph Certification Test

June 24, 2020

Neo4j 4.0 Fabric – A Look Behind the Curtain

February 7, 2020

TigerGraph Blog

  • Categories
    • blogs
      • About TigerGraph
      • Benchmark
      • Business
      • Community
      • Customer
      • Customer 360
      • Cybersecurity
      • Developers
      • Digital Twin
      • eCommerce
      • Emerging Use Cases
      • Entity Resolution
      • Finance
      • Fraud / Anti-Money Laundering
      • GQL
      • Graph Database Market
      • Graph Databases
      • GSQL
      • Healthcare
      • Machine Learning / AI
      • Podcast
      • Supply Chain
      • TigerGraph
      • TigerGraph Cloud
    • Graph AI On Demand
      • Keynote
      • Session
    • Video
  • Recent Posts

    • Evanta CDO Executive Summit – Chicago
    • FIMA Europe 2022
    • Data. AI. Precision. Innovation. For Pharma, Healthcare & Partners
    • Gartner IT Symposium/Xpo
    • Gartner IT Symposium
    TigerGraph

    Product

    SOLUTIONS

    customers

    RESOURCES

    start for free

    PRODUCT
    • Overview
    • What's New
    • Toolkits
    • Connectors
    • Data Science Library
    TIGERGRAPH DB
    • Overview
    • No-Code Graph Analytics
    • Features
    • GSQL Query Language
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    • Login
    • FAQ
    • Pricing
    • Cloud Marketplaces
    GRAPHSTUDIO
    • Overview
    • Take a Test Drive
    TRY NOW
    • Get Started for Free
    • Compare Editions
    • License FAQ
    SOLUTIONS
    • Why Graph?
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer 360/MDM
    • Recommendation Engine
    • Anti-Money Laundering
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network & IT Management
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    success stories
    • Customer Success Stories

    Partners

    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    LIBRARY
    • Resources
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Million Dollar Challenge
    EDUCATION
    • TigerGraph Academy
    • Certification
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem

    COMPANY

    Company
    • Overview
    • Careers
    • News
    • Press Release
    • Awards
    • Legal
    • Patents
    Get Started
    • Get TigerGraph
    • Compare Editions
    • Online Demo - Test Drive
    • Request a Demo

    Product

    • Overview
    • TigerGraph 3.0
    • TIGERGRAPH DB
    • TIGERGRAPH CLOUD
    • GRAPHSTUDIO
    • TRY NOW

    customers

    • success stories

    RESOURCES

    • LIBRARY
    • Events
    • EDUCATION
    • BLOG
    • DEVELOPERS

    SOLUTIONS

    • SOLUTIONS
    • use cases
    • industry

    Partners

    • partner program

    company

    • Overview
    • news
    • Press Release
    • Awards

    start for free

    • Request Demo
    • take a test drive
    • SUPPORT
    • COMMUNITY
    • CONTACT
    • Copyright © 2022 TigerGraph
    • Privacy Policy
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph | Privacy Policy

    Copyright © 2020 TigerGraph Privacy Policy

    • SUPPORT
    • COMMUNITY
    • COMPANY
    • CONTACT
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph

    Privacy Policy

    • Products
    • Solutions
    • Customers
    • Partners
    • Resources
    • Company
    • START FOR FREE
    START FOR FREE
    START FOR FREE
    TigerGraph
    PRODUCT
    PRODUCT
    • Overview
    • GraphStudio UI
    • Graph Data Science Library
    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    TRY TIGERGRAPH
    • Get Started for Free
    • Compare Editions
    SOLUTIONS
    SOLUTIONS
    • Why Graph?
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer Journey/360
    • Recommendation Engine
    • Anti-Money Laundering (AML)
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network Resources Optimization
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    CUSTOMERS
    read all success stories

     

    PARTNERS
    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    RESOURCES
    LIBRARY
    • Resource Library
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Graph for All - Million Dollar Challenge
    EDUCATION
    • TigerGraph Academy
    • Certification
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem
    COMPANY
    COMPANY
    • Overview
    • Leadership
    • Careers  
    NEWS
    PRESS RELEASE
    AWARDS
    START FOR FREE
    Start For Free
    • Try TigerGraph Now
    • Request a Demo
    • SUPPORT
    • COMMUNITY
    • CONTACT
    Dr. Jay Yu

    Dr. Jay Yu | VP of Product and Innovation

    Dr. Jay Yu is the VP of Product and Innovation at TigerGraph, responsible for driving product strategy and roadmap, as well as fostering innovation in graph database engine and graph solutions. He is a proven hands-on full-stack innovator, strategic thinker, leader, and evangelist for new technology and product, with 25+ years of industry experience ranging from highly scalable distributed database engine company (Teradata), B2B e-commerce services startup, to consumer-facing financial applications company (Intuit). He received his PhD from the University of Wisconsin - Madison, where he specialized in large scale parallel database systems

    Todd Blaschka | COO

    Todd Blaschka is a veteran in the enterprise software industry. He is passionate about creating entirely new segments in data, analytics and AI, with the distinction of establishing graph analytics as a Gartner Top 10 Data & Analytics trend two years in a row. By fervently focusing on critical industry and customer challenges, the companies under Todd's leadership have delivered significant quantifiable results to the largest brands in the world through channel and solution sales approach. Prior to TigerGraph, Todd led go to market and customer experience functions at Clustrix (acquired by MariaDB), Dataguise and IBM.