Summary

An automatic, generic knowledge graph framework that builds itself from textual unstructured data and uses natural language to query the data & applying it on a big financial stock dataset.

Overview

An automatic, generic knowledge graph framework that builds itself from textual unstructured data and uses natural language to query and apply data.

Inspiration

There is a huge amount of unstructured textual data that gets generated every day. Enterprises, Individuals, News Media, etc. all create volumes of unstructured data that are hard to analyze and understand in a structured way.

 

Business users / Individuals are left with no simpler ways to analyze unstructured data without considerable efforts of data transformation, data schema design, and curation. Moreover, Querying and searching through the data, takes additional efforts and cannot be easily done by those without the necessary skill sets and without an intermediary application.

 

This problem led to the creation of Project Athena (named after the Greek goddess of knowledge and wisdom). I picked a financial news dataset with close to 1 million (1,000,000) records and got it analyzed through the application.

 

TigerGraph is the best fit for this problem, with its high performance, elegantly designed GSQL and a rich data science library.

What it does
  • What if there was an intelligent system that automatically understands the entities, people, location, objects and the relationships among them that are found in the unstructured data and creates a big knowledge graph?
  • What if there was a system that allowed us to search and query that data using natural language?
  • What if the system was generically applicable regardless of the domain of the data?

Consider for example, that the intelligent system creates an automatic financial knowledge graph when executed on top of years of financial unstructured data, and it creates an automatic medical knowledge graph when executed on top of a pile of medical unstructured data.

Consider for example, that we are able to query the knowledge graph just by typing in natural language –

  • “Who bought Agilent Technologies Inc” or
  • “who all are investing in Russia” or
  • “Who bought Microsoft Corp between the years 2014 and 2016” or
  • “who had beaten the market in 2014” or
  • even queries like “who bought the companies that bought Facebook Inc” without having to rely on additional querying tools.