This transcript is edited from the TigerGraph Connections podcast episode published on June 28, 2022, with Victor Lee, VP of Machine Learning and AI at TigerGraph.
Corey Tomlinson: I’m very happy to have Victor Lee, VP of Machine Learning and AI at TigerGraph as my guest to talk about the release of the TigerGraph Machine Learning Workbench, amongst other things too. So Victor, before we get started, can you talk a little bit about your background and your journey about how you ended up at TigerGraph.
Victor Lee: Yeah, thanks Corey! Thanks for inviting me to be here. I’ve been at TigerGraph for eight years if you count my first year as a consultant. I started out as an electrical engineer, worked in Silicon Valley when it was about silicon, and then I was out of tech for five years while I allowed my wife to pursue her career. We moved out of state and I ended up following what was sort of in the later stages of my tech career I was doing technology transfer, which means my company had developed some sophisticated technology. We were selling it and licensing it to other companies and my job was to explain that technology through packaging it up as software design kits and to provide documentation and to provide training. That’s something that I’ve kind of continued to do throughout my career.
I was actually teaching English as a second language to adults completely outside tech, then we moved again and I went back to grad school and because I was no longer in the Bay Area … I’m in Ohio … I decided to switch sort of sideways to software. And I’d always been interested in organizing and analyzing data, so I started looking at databases and there was some exciting work going on with one of the professors at Kent State University with data mining on graphs, so I learned about graph algorithms and data mining. Graphs have this fascinating way to look at your data and graph algorithms and data mining as a way to get more out of the data, to discover more from the data that you already have. So that became my research area where I got my Ph.D. and actually taught as a visiting professor at John Carroll University for three years, then had the opportunity to join TigerGraph and jumped at it and it’s been really exciting.
So I’ve been able to combine my academic interests with my previous work experience and get to develop this new technology to the point where now I’m I’m leading our company’s efforts in graph data science and graph machine learning.
Corey: When we were talking about this episode and kind of thinking about what we were going to discuss you mentioned to me how often you need to answer the questions “Why graph?” and more specifically for us, “Why TigerGraph?” Before we get into the Machine Learning Workbench and talk about the product, could you provide a short answer to those questions?
Victor: Sure, so why graph? Using SQL table-based has been very powerful. It’s been the most successful model for the past 40 years but other models have developed because you want different tools for different tasks. So what is the graph doing for you that a more tabular relational database is not doing for you? A graph lets you follow connections at volume efficiently. That volume could be in breadth, where I have lots of people and they’re connected to lots of people and I want to explore that efficiently, or I want to search deep to follow a long chain of connections and see where that takes me. The database is designed to handle them efficiently and that efficiency can be orders of magnitude at times, even a thousand times faster than trying to do it in a tabular database. So that’s the fundamental reason that if connections, relationships, and correlations are important on a person-by-person, entity-by-entity, product-by-product basis where can connect to different things and you care about those differences and you want to look at things individually, then a graph is going to be your best solution because it’s going to give you that a thousand times or hundreds of thousand times faster in some cases.
Why TigerGraph? Because there are a couple of different types of graphs. There are so-called knowledge graphs, which are great for representing knowledge. They’re not as great at actually working with the data and analyzing it. If you want to build a library of what you know in a very structured way, a knowledge graph using RDF was designed for that.
If you want to analyze data, if you want to do a lot of heavy computation, which is usually what most businesses want to do, then a property graph is better, and that’s what TigerGraph is. If you have a lot of data and if you want to do more sophisticated analytics, then performance is key and that’s where TigerGraph sells because we were designed for massively parallel processing designed as a distributed graph. Our query language was designed for algorithms and analytics. So when you start to look and say, “Okay, I need a graph,” you probably want a property graph to do more computational and analytical workloads. That’s where TigerGraph shines and that’s where our customers have migrated towards us because of that capability.
Corey: Thanks Victor, that’s foundational and it leads us to why we’re actually here today talking about the Machine Learning Workbench. Can you tell me a little bit about it and what problems does it solve for our customers?
Victor: We developed the Machine Learning Workbench to make it easier for data scientists to take advantage of graph in a very low effort, familiar way. Basically rather than having them come to us and learn a lot of new formats and new ways of working with data. So the value of graph has been proven not just for knowledge querying, not just for simple pattern matching, but for machine learning and for data science. The reason why is that those relationships, which are fundamental to what a graph does, are information graphs you know in a sense presenting a more concise … distilled … that’s the word I’m looking for, they really distill the information of what you’re looking for.
Sometimes it’s right on the surface layer. Sometimes you run a graph algorithm to extract it. For example, one of the well-known algorithms is PageRank, which gives a score to each vertex for how much influence it has within the graph structure. That score depends on what your graph looks like, where you sit in the graph, who’s connected to whom, etc. Every vertex has a different score – that’s what a data scientist calls a feature. It’s something I know about that vertex that distinguishes it from others. The more features you have and the more features that describe in some way the characteristics they’re looking for, the better a predictive model they can make.
So-called supervised machine learning is trying to make models that either predict the current state of events when you don’t know everything or trying to predict what will happen in the future, and the more you can extract from your current data to understand it better, the more accurate predictions you can make. People use this for product recommendations, they use it to detect fraud and other criminal activity, and they use it to try to optimize operations. There are a lot of ways that machine learning can be used.
We also hear about being used for natural language processing where you’re understanding or producing human language. That’s another specialized case of prediction trying to predict what is the meaning of this word and how do I use it. So graph because it’s giving you more insight into your data than you would otherwise have by following those connections, by following the chain of connections, or by looking at the breadth of connections and looking at your surroundings and understanding your surroundings. With a so-called 360 view that gives you more information, you get more accurate machine learning models.
So the reason why you want to use graph is clear. The question is then “how do you use graph?” So there isn’t yet a standard graph query language – if we have time we could get to talking about the effort to standardize the query language. But right now graph is new to a lot of people. They need some time to learn it. They have to learn a query language. There are other APIs, there are graph algorithms, there are graph machine learning techniques. There are potentially a lot of things a data scientist has to learn before they’re ready to take advantage of it.
The Machine Learning Workbench is designed to reduce that effort. So that literally in a day they could see benefits now. First of all, they have to construct their graph; I’m talking about after they have the graph. So once they have their graph you want to run a graph algorithm we have a built-in library and we have a Python interface for it. Because data scientists all know the Python language. Now you need to select a model we support. We export the data for you again. With a simple Python command, we’ve sampled the data we have so-called notebooks written in Python that lay out all the operations for taking graph data, graph features training a model.
Do you want to use a conventional model? Fine. Do you want to use one of the new graph neural network models? Fine. We’ve got sample notebooks you have examples of fraud detection again where it runs right out of the box and so in a day, their graph data scientists could see results and then start to build on those. So that’s really why we did the Machine Learning Workbench based on the environment that they already use, Python Jupyter notebooks adding libraries written in Python for the graph operations, built-in high-speed data connector sample notebooks – it’s all there.
Corey: So pivoting off of that, for somebody who’s maybe interested in the Machine Learning Workbench and trying it out. What advice do you have for them to get started with it?
Victor: I’d say just download the product if you want to use the on-prem version. We’re going to have a cloud version available also. Go through the tutorial to get the general flow and we have videos, we have sample notebooks. We have consultants who can help you, you want to use it with your data and know what task you want to do. A lot of people will be wanting to do a comparison between how well they perform this task or this prediction using their current method. How well do they do it using graph? We welcome that comparison. It depends on your data and depends on your particular task. Everybody’s going to have a slightly different result.
We love to see new use cases, so we’d be happy to hear from people who are trying it out to see what they are trying to do. We’ve got community help. We have professional services for customers. To start, follow the tutorial and then see how to design a graph for your data; some people out there already have a graph and they’ll have a jump start.
Corey: I’m not going to let you off the hook on something you said earlier. You mentioned the standard graph query language. Can you talk a little bit about the efforts to standardize and create a universal query language for graph?
Victor: This effort started about three years ago. There were a couple of companies that saw the need for one standard query language. The way that the graph database market developed, there’s a couple different companies offering sometimes similar, sometimes different solutions, but they all have different query languages and customers were telling us it’d be great if there was one standard. They’d like to be able to move from one product or compare one product a little bit more easily.
Customers need people who know how to work with graph databases. It builds the market. So it actually lifts all boats, as we say. It makes graph databases more acceptable to have a standard language. TigerGraph was not actually one of the companies, but we joined the effort fairly soon after it had started – we joined in late 2018. I was just in a meeting earlier today, a two-week meeting that’s taking place hybrid. It’s physically in Germany and I’m here in the Eastern time zone. I was up at 3 a.m. virtually attending this standards meeting. We’re getting close, working with people that other times they’re competitors. But when we’re in that room, we’re working together to try to make one common standard that’s going to help everybody help the customers and help us to build a bigger and more successful market. I guess one question I’ll sort of preemptively answer, people have asked me is this standard going to be like GSQL, TigerGraph’s current language. Is it going to be like Cypher, which is one of the others, and perhaps the best-known language right now, because it’s been out the longest.
The answer is it’s neither because this is the future. We’re looking ahead. We all have our legacy of what our product started out to be and we’re looking ahead at what capabilities we want customers to have in the future. So we’re all challenging ourselves to see what can we do. What will have in 2023? So the syntax reflects that, it’s not exactly GSQL, it’s not exactly Cypher, it’s not exactly SQL. It’s a new language. It’s got similarities with all these languages. We are planning a migration path, our competitors are planning migration paths because we want people to be able to move to that language as easily as possible.
Corey: It’s really fascinating to hear competitors coming together working for the common good of their customers. I want to thank you for taking a moment to join us and talk about the Machine Learning Workbench.
Victor: Thank you so much Corey, definitely looking forward to the next time we can do this.