Sink Your Teeth Into FIBO with A Native Parallel Graph Database

Sink Your Teeth Into FIBO with A Native Parallel Graph Database

Authored by TigerGraph Solutions and Engineering

Financial institutions have massive volumes of highly interconnected and structured data, which they need to ingest in real time and to use for a wide range of queries and reports, from simple fund transactions to machine learning-driven fraud detection and risk analysis. To provide a standard format for representing and interchanging such data, the FIBO knowledge graph ontology was developed, to use with triple-store or RDF databases.  However, while RDF databases are great for modeling and exchanging data, property graph databases offer a more efficient and intuitive object-oriented approach to storing and querying that data. The query speed for large graphs can be a deal-breaker for RDF and is a challenge for most property graphs.

Good news! TigerGraph, the fastest and most scalable graph database analytics platform, can easily ingest large and complex ontologies such as FIBO AND perform both transactional workloads and full-graph analytics at blazing speed.

1. Loading the FIBO Ontology into TigerGraph

After multiple TigerGraph implementations for the customers on semantic technologies, we were  curious about loading The Financial Industry Business Ontology (FIBO) into our property graph. We found it was very easy to do, and this has given us a newfound appreciation of how interoperable different graph technologies are and how knowledge encoded in FIBO can be leveraged in TigerGraph applications. This article gives an overview of this process.

FIBO is a large and comprehensive financial ontology built by the best minds in the semantic web over the last decade. The genesis of FIBO was the banking crisis of 2008 and the need for a common semantic model for the financial industry to understand and share their data and have a common understanding of risk. It is published and supported by the EDM Council, a financial industry consortium.

FIBO is a complex collection of ontologies. The sheer number of ontologies, classes, properties, and constraints have been known to be very challenging for semantic technologies built specifically to handle the W3C specifications. In the next section we will show you how to create an RDF graph schema, map FIBO in N-Quads to that schema, load FIBO, explore FIBO, and use FIBO to annotate data in TigerGraph applications.

Figure 1: W3C Layer Cake Diagram

2. TigerGraph RDF Schema for FIBO

FIBO is represented in W3C OWL, which is defined on top of RDFS, which in turn is layered on RDF (See Figure 1). RDF is a very flexible representation for building graphs and ontologies, however, it is interesting to note that RDF has a fixed graph data structure that can be implemented as a graph schema in TigerGraph.

We found that by directly defining an RDF Graph Schema in TigerGraph made it relatively trivial to load FIBO and make it available within TigerGraph. Since the approach is based on RDF, it has the added advantage of being able to load any RDF data, OWL ontologies, and SKOS vocabularies. The logical data model of RDF (See Figure 2) was used as a guideline for creating the TigerGraph Graph Schema for RDF. We based the TigerGraph RDF schema on the logical relationships between RDF Resource, Subject, Predicate, Object, Statement, and Graph, so this makes it easily recognizable to semantic web practitioners. The TigerGraph RDF graph schema has four vertex types: Statement, Resource, Predicate, and Graph. The Statement Vertex Type is connected to the Resource, Predicate, and Graph vertex types via subject, predicate, object, and graph edges (see Figure 3). This structure also makes it easy to analyze the structure of FIBO using GSQL-based graph algorithms, for example, to find paths between FIBO concepts, or to identify the minimal set of FIBO needed to support a particular application.

Note: All the screenshots for Figures 3 to 8 are from GraphStudio, TigerGraph’s visual design tool for graph modeling and querying.

Figure 2: RDF Logical Data Model

Figure 3: The TigerGraph RDF Graph Schema

3. Mapping and Ingesting FIBO

The EDM Council releases FIBO in multiple RDF serializations including N-Quads format. The N-Quads format is convenient because it can be read by TigerGraph’s CSV parser and the subject, predicate, object, and graph columns can easily be mapped to the TigerGraph RDF schema. The FIBO N-Quads file is shown as a CSV data source in Figure 4.

Figure 4: FIBO NQuads file processed as CSV

We can use GraphStudio to easily map, or associate, the columns in the FIBO file to the entities and their attributes in the TigerGraph graph schema. Figure 5A shows a sample mapping of FIBO N-Quads to the predicate edge type between Statement and Predicate. Figure 5B then drills down to show how individual FIBO columns are mapped to individual edge attributes. The Concat ‘helper’ function concatenates the subject, predicate, and object to be used as the identifier for Statement.

Figure 5B: Mapping individual elements from FIBO RDF to attributes of a TigerGraph edge.

The complete mapping to load FIBO into TigerGraph is shown in Figure 6. The maps are displayed as dotted lines between the FIBO source document and the edge types in the RDF schema in TigerGraph. The mapping is accomplished using 4 maps to the object, subject, predicate, and graph edge types of the RDF schema. The primary key of the RDF statement vertex is a function of the concatenation of the subject, predicate, and object identifiers.

Figure 6: Complete Mapping FIBO N-Quads to The RDF Graph Schema.

4. Exploring FIBO In GraphStudio

Once FIBO is loaded, users can use GraphStudio to explore FIBO (See Figure 7). They can also write queries to perform inference or machine learning, for example, to implement graph algorithms or to classify data in terms of FIBO classes, or to integrate data from other sources, like Legal Entity Identifiers, and ultimate beneficiary links. It is also easy to use GSQL to implement the OWL/RDFS inferences used by FIBO.

Figure 7: Corporation Subgraph of FIBO Rendered in GraphStudio

5. FIBO Data Privacy using MultiGraph

MultiGraph, a TigerGraph unique capability, makes it possible for FIBO ontology to be shared among various graph instances. MultiGraph enables an administrator to define multiple graph data domains, each with own set of authorized users and roles. These graph domains can overlap, that is, different graphs can share some data and keep some other data private. In the case of FIBO, every graph domain should have access to the ontology itself, but each graph may have other data, especially financial account data, which needs to be tightly controlled. MultiGraph is designed for heavily-regulated industries such as Finance, Healthcare, Pharmaceutical, for covering risk, compliance, and privacy requirements. MultiGraph supports multiple tenancy, fine-grained privileges, overlapping graphs, and hierarchical subgraphs.

Conclusion

TigerGraph easily ingests FIBO, one of the largest and most complex ontologies in the world. TigerGraph provides multiple advantages for financial institutions, including 1) finding multi-hop paths between FIBO concepts, 2) determining the minimum set of FIBO needed to support a particular set of data, and 3) integrating FIBO with a high-performance and scalable hybrid transactional/analytical platform.

TigerGraph lets you have your cake and eat it too: TigerGraph’s ability to customize its graph representation to easily interoperate with complex financial enterprise ontologies such as FIBO alongside the graph solutions provides competitive advantages to business with real-time recommendations, fraud detection, risk analysis, and much more. This adaptability and flexibility to scale up and scale out as needed allow TigerGraph to span from massive enterprise-wide knowledge graphs to real-world business solutions. TigerGraph’s FIBO-fueled financial fabric enables enterprise customers to continuously catalog and harmonize their data in financial industry-standard knowledge graphs and to selectively share that data using MultiGraph to business customers operating real-time business systems, which in turn share insights back to the enterprise knowledge graph.