Skip to content
START FOR FREE
START FOR FREE
  • SUPPORT
  • COMMUNITY
Menu
  • SUPPORT
  • COMMUNITY
MENUMENU
  • Products
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      TIGERGRAPH CLOUD

      • Overview
      • TigerGraph Cloud Suite
      • FAQ
      • Pricing

      USER TOOLS

      • GraphStudio
      • Insights
      • Application Workbenches
      • Connectors and Drivers
      • Starter Kits
      • openCypher Support

      TIGERGRAPH DB

      • Overview
      • GSQL Query Language
      • Compare Editions

      GRAPH DATA SCIENCE

      • Graph Data Science Library
      • Machine Learning Workbench
  • Solutions
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      Watch a TigerGraph Demo

      Solutions

      • Solutions Overview

      INCREASE REVENUE

      • Customer Journey/360
      • Product Marketing
      • Entity Resolution
      • Recommendation Engine

      MANAGE RISK

      • Fraud Detection
      • Anti-Money Laundering
      • Threat Detection
      • Risk Monitoring

      IMPROVE OPERATIONS

      • Supply Chain Analysis
      • Energy Management
      • Network Optimization

      By Industry

      • Advertising, Media & Entertainment
      • Financial Services
      • Healthcare & Life Sciences

      FOUNDATIONAL

      • AI & Machine Learning
      • Time Series Analysis
      • Geospatial Analysis
  • Customers
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      CUSTOMER SUCCESS STORIES

      • Ford
      • Intuit
      • JPMorgan Chase
      • READ MORE SUCCESS STORIES
      • Jaguar Land Rover
      • United Health Group
      • Xbox
  • Partners
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      PARTNER PROGRAM

      • Partner Benefits
      • TigerGraph Partners
      • Sign Up
      TigerGraph partners with organizations that offer complementary technology solutions and services.​
  • Resources
    • The World’s Fastest and Most Scalable Graph Platform

      LEARN MORE

      BLOG

      • TigerGraph Blog

      RESOURCES

      • Resource Library
      • Benchmarks
      • Demos
      • O'Reilly Graph + ML Book

      EVENTS & WEBINARS

      • Graph+AI Summit
      • Graph for All - Million Dollar Challenge
      • Events &Trade Shows
      • Webinars

      DEVELOPERS

      • Documentation
      • Ecosystem
      • Developers Hub
      • Community Forum

      SUPPORT

      • Contact Support
      • Production Guidelines

      EDUCATION

      • Training & Certifications
  • Company
    • Join the World’s Fastest and Most Scalable Graph Platform

      WE ARE HIRING

      COMPANY

      • Company Overview
      • Leadership
      • Legal Terms
      • Patents
      • Security and Compliance

      CAREERS

      • Join Us
      • Open Positions

      AWARDS

      • Awards and Recognition
      • Leader in Forrester Wave
      • Gartner Research

      PRESS RELEASE

      • Read All Press Releases
      TigerGraph Recognized in 2022 Gartner® Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases
      January 12, 2023
      Read More »

      NEWS

      • Read All News

      A Shock to the System: ShockNet Predicts How Economic Shocks Could Affect the World Economy

      TigerGraph Recognized for the First Time in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems

  • START FREE
    • The World’s Fastest and Most Scalable Graph Platform

      GET STARTED

      • Request a Demo
      • CONTACT US
      • Try TigerGraph
      • START FREE
      • TRY AN ONLINE DEMO

Linking Documents in a Semantic Graph Database

  • Emily McAuliffe
  • June 2, 2020
  • blog, Developers
  • Blog >
  • Linking Documents in a Semantic Graph Database

Originally posted on Towards Data Science by Akash Kaul. Follow Akash on LinkedIn.

Part 1 of this blog series reviews how the data is extracted before being used in TigerGraph. Review Part 1 here: https://towardsdatascience.com/using-scispacy-for-named-entity-recognition-785389e7918d

1. Building a Graph Using Publication Metadata

This project is a continuation of a previous project. We used scispaCy, an NLP package for Python, to extract medical keywords from the abstracts of a collection of articles on Covid-19. Check out that post for an in-depth guide on how we extracted the data. The goal of this project is to build a graph to connect the publications to the entities we extracted. We are using a graph because it allows us to efficiently display and analyze our highly connected data. If you aren’t already familiar with graphs, I suggest checking out this video. To create our graph, we’ll be using a free, UI-based graphing platform from TigerGraph called TigerGraph Cloud.

2. Create a TigerGraph Account

Before you do anything else, you will first need to create a TigerGraph account. TigerGraph has a great video that takes you through the steps to create your account and build an empty graph. Follow along with the steps in the video, and load up your solution in GraphStudio. After the solution is loaded, you should reach this homepage.

3. Designing a Graph Schema

We’ll start by clicking the Design Schema tab. You should see a blank page that looks something like this.

ADDING VERTICES

Next, we’ll create some vertices. Our central vertex type will be called “PUBLICATION”. This vertex will store data related to the publication like the doi, the title, and the URL. These specific properties for a vertex type are known as attributes. To create the vertex, press the + icon. Fill in the data as follows.

You’re free to edit the names of the vertices and their attributes to whatever names you like, as well as customize the icon and color for each vertex type. Just make sure the attribute types are correct.

There are some publication attributes, like the license type and journal of publication, that we didn’t list here. That’s because ideally, attributes are traits that will be UNIQUE for each vertex of a given type. Since multiple publications can have the same license, journal, etc., it’s best to leave those as SEPARATE VERTEX TYPES. We can still easily connect those vertices in our design schema.

Let’s continue creating our vertices. We’ll need 3 new vertices for highly repetitive publication traits (license, author, and journal), a vertex for the entities we extracted previously, and a vertex for the classes of those entities. These vertices should look like this.

Journal Vertex

Author Vertex

License Vertex

Entity Vertex

Class Vertex

With all of our vertices now created, our graph should look something like this.

It looks pretty good, but it’s missing something. We still need to connect our vertices with edges!

If you haven’t already, make sure to publish your graph using the uparrow next to the + sign. This ensures your graph is saved.

ADDING EDGES

To add an edge, click the right arrow icon and select the two vertices that you want to edge to connect. For example, if we want to connect the author vertex to the publication vertex, we press the arrow, then click on the author icon and the publication icon. We should get a pop-up that looks like this:

TigerGraph automatically fills in the source and target vertex, so all we have to do is name it. It’s helpful to name the edge after the two vertices it connects. For this edge, let’s call it “PUB_HAS_AUTHOR”. Our edge should now look like this:

Edge between Publication and Author

Now we just repeat the process for our remaining edges.

Edge between Publication and Journal

Edge between Publication and License
Edge between Entity and Publication
Edge between Entity and Class

Note that all of our edges are undirected. This is because the relationship between any 2 vertices goes both ways (i.e. a publication is tied to an author but the author is also tied to that publication).

After we add all of these edges, our final graph should look something like this:

Finished design schema

4. Uploading & Mapping our Data

With our graph design finished, we now need to upload our data. You can find all of the files that we need to upload here. I’ll briefly explain the contents of each file.

sample.csv — contains general information about each publication (like the doi, URL, journal, license, etc.

normalizedAuthors.csv — contains a normalized list of the doi and authors of each publication

Entity_*.csv — Each of these four files contain the entities and classes extracted using one of the prebuilt models in scispaCy. If you want to see how these medical terms were extracted, check out my last article.

To upload these files, click the Map Data to Graph tab and then hit the icon that looks like a file with a + sign. Now add all the files you downloaded to the server and click ADD.

This is where the tricky part begins. You’ll need to manually map each file to every vertex and edge that corresponds to a column in that file. For example, the normalizedAuthors filehas the columns “doi” and “author”. This means the file has to be mapped to the publication vertex (where the doi attribute lives), the author vertex, and the edge that connects the publication and author vertices. To map the file, click the crossed arrows icon, select your file, and select a vertex or edge. The mapping for the normalizedAuthors file should look like this:

And each individual mapping should look like this:

Mapping normalizedAuthors.csv to Author vertex
Mapping normalizeAuthors.csv to edge connecting Author and Publication
Mapping normalizedAuthors.csv to Publication vertex

Again, we repeat these steps for our other files as shown.

Total mapping of sample.csv

Mapping sample.csv to Journal Vertex

Mapping sample.csv to edge connecting Publication and Journal

Mapping sample.csv to Publication vertex

Mapping sample.csv to edge connecting Publication and License

Mapping sample.csv to License vertex

Now we map the entity files. I’ll show the mapping for one file since it’s the same for all four.

Total mapping for entity file

Mapping entity file to Class vertex
Mapping entity file to edge connecting Entity and Class
Mapping entity file to Entity vertex
Mapping entity file to edge connecting Entity and Publication
Mapping entity file to Publication vertex

You’ve just completed the hardest part of this project. With the mapping done, our graph should look something like this:

Hit the publish button to save all of your hard work!

5. Load Data

Once the data is mapped, loading the data is quite easy. Click the Load Data tab and then hit the play button to start loading your data. This process should take about a minute. Once it’s done, you can move on to visualizing your graph.

6. Visually Exploring your Graph

Now for the fun part (although I hope the whole project has been fun thus far). With your design created and your data loaded, you can now visually explore your graph. To do so, hit the Explore Graph tab. There’s a lot you can do on this page (the full documentation can be found here), but let’s look at a couple of specific examples.

Searching for a vertex

You can search for a specific vertex using the vertex id, or you can find a random vertex of a specific type. We’ll do the latter. For our example, let’s pick 5 vertices of the type “PUBLICATION”. You should see 5 vertices pop up.

Example search for 5 vertices of type Publication

You can hover over each vertex to see all of its attributes. You can also double-click any vertex to bring up its immediate connections.

Visual after double-clicking a Publication vertex

If your output looks messy, you can click and drag vertices around to clean up your screen.

Expand from Vertices

If you change from the Search option (magnifying glass) to the Expand from Vertices option (triangle-looking symbol), you can expand from a vertex to other vertices beyond the immediate connections. To make sure your computer doesn’t overload, change the expanding edge limit to something small like 4 or 5. Now, let’s see what happens when we expand on one of the entities we just revealed.

Expand from Entity vertex through all edge types, towards all vertex types

We can see that upon expansion of our entity vertex, the class of the entity, as well as other papers that also have that same entity, are pulled up. This is a great example of why we used a graph in the first place. We can easily search our graph for publications, entities, authors, licenses, etc., and with a few clicks, we can show how different publications are connected, which publications share common entities, and much much more.

Further Exploration

I won’t cover the other graph exploration options or how you can create queries (which are essentially code versions of the visual expansions we just used) in this article. Stay tuned for Part 3 of this series, where I will take you through writing some basic search queries for our graph. Until then, I’ll leave them as challenge problems for you to tackle on your own. If you’re interested, you can read about queries here.

Conclusion

Graphs offer unparalleled benefits for mapping and analyzing highly connected data. Through our exploration of TigerGraph Cloud, you have learned how to visually design, map, and explore your own fully-fledged graph. You’ve made a huge first step into the world of graph databases and gained a considerable advantage for your future projects. Graph databases can be implemented on everything from small-scale projects like this one to larger structures like Facebook and Google Search. Graphs possess incredible power; a power that you now have access to.

Resources

  1. https://medium.com/r/?url=https%3A%2F%2Ftowardsdatascience.com%2Fusing-scispacy-for-named-entity-recognition-785389e7918d
  2. https://www.youtube.com/watch?v=vJcxRjJ982k
  3. https://www.youtube.com/watch?v=JARd9ULRP_I&feature=youtu.be&utm_referrer=https%3A%2F%2Fwww.tigergraph.com%2Fstarterkits%2F
  4. https://gofile.io/d/GSLyHb
  5. https://docs.tigergraph.com/ui/graphstudio/explore-graph
  6. https://docs.tigergraph.com/ui/graphstudio/write-queries

You Might Also Like

TigerGraph Showcases Unrivaled Performance at Scale

TigerGraph Showcases Unrivaled Performance at Scale

January 12, 2023
How to Create a Visual Graph Analytics Application Using TigerGraph Insights in 30 mins

How to Create a Visual Graph...

November 14, 2022
Turbocharge your business intelligence with TigerGraph’s ML Workbench on TigerGraph Cloud

Turbocharge your business intelligence with TigerGraph’s...

November 14, 2022

Introducing TigerGraph 3.0

July 1, 2020

Everything to Know to Pass your TigerGraph Certification Test

June 24, 2020

Neo4j 4.0 Fabric – A Look Behind the Curtain

February 7, 2020

TigerGraph Blog

  • Categories
    • blogs
      • About TigerGraph
      • Benchmark
      • Business
      • Community
      • Compliance
      • Customer
      • Customer 360
      • Cybersecurity
      • Developers
      • Digital Twin
      • eCommerce
      • Emerging Use Cases
      • Entity Resolution
      • Finance
      • Fraud / Anti-Money Laundering
      • GQL
      • Graph Database Market
      • Graph Databases
      • GSQL
      • Healthcare
      • Machine Learning / AI
      • Podcast
      • Supply Chain
      • TigerGraph
      • TigerGraph Cloud
    • Graph AI On Demand
      • Analysts and Research
      • Customer 360 and Entity Resolution
      • Customer Spotlight
      • Development
      • Finance, Banking, Insurance
      • Keynote
      • Session
    • Video
  • Recent Posts

    • It’s Time to Harness the Power of Graph Technology [Infographic]
    • TigerGraph Showcases Unrivaled Performance at Scale
    • TigerGraph 101 An Introduction to Graph | Jan 26th @ 9am PST
    • Data Science Salon New York
    • Tech For Retail
    TigerGraph

    Product

    SOLUTIONS

    customers

    RESOURCES

    start for free

    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    GRAPH DATA SCIENCE
    • Graph Data Science Library
    • Machine Learning Workbench
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    • Login
    • FAQ
    • Pricing
    • Cloud Marketplaces
    USEr TOOLS
    • GraphStudio
    • TigerGraph Insights
    • Application Workbenches
    • Connectors and Drivers
    • Starter Kits
    • openCypher Support
    SOLUTIONS
    • Why Graph?
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer 360/MDM
    • Recommendation Engine
    • Anti-Money Laundering
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network & IT Management
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    success stories
    • Customer Success Stories

    Partners

    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    LIBRARY
    • Resources
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Million Dollar Challenge
    EDUCATION
    • Training & Certifications
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem

    COMPANY

    Company
    • Overview
    • Careers
    • News
    • Press Release
    • Awards
    • Legal
    • Patents
    • Security and Compliance
    • Contact
    Get Started
    • Start Free
    • Compare Editions
    • Online Demo - Test Drive
    • Request a Demo

    Product

    • Overview
    • TigerGraph 3.0
    • TIGERGRAPH DB
    • TIGERGRAPH CLOUD
    • GRAPHSTUDIO
    • TRY NOW

    customers

    • success stories

    RESOURCES

    • LIBRARY
    • Events
    • EDUCATION
    • BLOG
    • DEVELOPERS

    SOLUTIONS

    • SOLUTIONS
    • use cases
    • industry

    Partners

    • partner program

    company

    • Overview
    • news
    • Press Release
    • Awards

    start for free

    • Request Demo
    • take a test drive
    • SUPPORT
    • COMMUNITY
    • CONTACT
    • Copyright © 2023 TigerGraph
    • Privacy Policy
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph | Privacy Policy

    Copyright © 2020 TigerGraph Privacy Policy

    • SUPPORT
    • COMMUNITY
    • COMPANY
    • CONTACT
    • Linkedin
    • Facebook
    • Twitter

    Copyright © 2020 TigerGraph

    Privacy Policy

    • Products
    • Solutions
    • Customers
    • Partners
    • Resources
    • Company
    • START FREE
    START FOR FREE
    START FOR FREE
    TigerGraph
    PRODUCT
    PRODUCT
    • Overview
    • GraphStudio UI
    • Graph Data Science Library
    TIGERGRAPH DB
    • Overview
    • Features
    • GSQL Query Language
    TIGERGRAPH CLOUD
    • Overview
    • Cloud Starter Kits
    TRY TIGERGRAPH
    • Get Started for Free
    • Compare Editions
    SOLUTIONS
    SOLUTIONS
    • Why Graph?
    use cases
    • Benefits
    • Product & Service Marketing
    • Entity Resolution
    • Customer Journey/360
    • Recommendation Engine
    • Anti-Money Laundering (AML)
    • Cybersecurity Threat Detection
    • Fraud Detection
    • Risk Assessment & Monitoring
    • Energy Management
    • Network Resources Optimization
    • Supply Chain Analysis
    • AI & Machine Learning
    • Geospatial Analysis
    • Time Series Analysis
    industry
    • Advertising, Media & Entertainment
    • Financial Services
    • Healthcare & Life Sciences
    CUSTOMERS
    read all success stories

     

    PARTNERS
    Partner program
    • Partner Benefits
    • TigerGraph Partners
    • Sign Up
    RESOURCES
    LIBRARY
    • Resource Library
    • Benchmark
    • Webinars
    Events
    • Trade Shows
    • Graph + AI Summit
    • Graph for All - Million Dollar Challenge
    EDUCATION
    • TigerGraph Academy
    • Certification
    Blog
    • TigerGraph Blog
    DEVELOPERS
    • Developers Hub
    • Community Forum
    • Documentation
    • Ecosystem
    COMPANY
    COMPANY
    • Overview
    • Leadership
    • Careers  
    NEWS
    PRESS RELEASE
    AWARDS
    START FREE
    Start Free
    • Request a Demo
    • SUPPORT
    • COMMUNITY
    • CONTACT
    Dr. Jay Yu

    Dr. Jay Yu | VP of Product and Innovation

    Dr. Jay Yu is the VP of Product and Innovation at TigerGraph, responsible for driving product strategy and roadmap, as well as fostering innovation in graph database engine and graph solutions. He is a proven hands-on full-stack innovator, strategic thinker, leader, and evangelist for new technology and product, with 25+ years of industry experience ranging from highly scalable distributed database engine company (Teradata), B2B e-commerce services startup, to consumer-facing financial applications company (Intuit). He received his PhD from the University of Wisconsin - Madison, where he specialized in large scale parallel database systems

    Todd Blaschka | COO

    Todd Blaschka is a veteran in the enterprise software industry. He is passionate about creating entirely new segments in data, analytics and AI, with the distinction of establishing graph analytics as a Gartner Top 10 Data & Analytics trend two years in a row. By fervently focusing on critical industry and customer challenges, the companies under Todd's leadership have delivered significant quantifiable results to the largest brands in the world through channel and solution sales approach. Prior to TigerGraph, Todd led go to market and customer experience functions at Clustrix (acquired by MariaDB), Dataguise and IBM.