Building YouTube/TikTok Recommendations in TigerGraph

This article will demonstrate how you can build your own analytical video recommendation system using GraphStudio through TGCloud.

I have provided the source files below, so you can follow along with my project from start to finish using only your web browser.

Create a Default Solution

First, visit tgcloud.io and register for a free account. Then, log in and click on the “My Solutions” tab

Then, click on “Create Solution” in the upper-right corner

 

Create a blank solution, and make sure the TigerGraph version is 3.1.1 (compatibility for later versions is not guaranteed)

For instance settings, keep everything at the free defaults

For solution settings, the only required field is setting a password

Once you have confirmed your solution settings, you will be brought back to the dashboard. Wait until your solution status is “Ready” and then open it through GraphStudi

Downloading Project Files

Now that your instance is ready, you can download a copy of my solution and two basic CSV datasets. [Download them here]

Import Solution

From the GraphStudio landing page, click on “Import an Existing Solution” and select the downloaded file solution.tar.gz

 

Select “Overwrite” when presented with an import warning

At the top-left corner, select the “TikTok” graph

Understanding the Graph Schema

We will now step through each of the tabs in the left-hand menu before we can start analyzing the data. First, we click on “Design Schema”:

  • user is identified by their username, and they also have attributes communityID (INT) and country (STRING)
  • video is identified by its videoID, and it also has attribute country (STRING)
  • tag and audio are both identified by their content

Here are the vertices and interactions (edges) that we will observe…

user can follow another user

CREATE DIRECTED EDGE following (FROM user, TO user)

user can watch a video

CREATE UNDIRECTED EDGE watched (
FROM user, 
TO video, 
percentWatched INT, 
commented BOOL, 
liked BOOL)

video can contain identifiable audio (exists as an integer ID)

CREATE UNDIRECTED EDGE VIDEO_HAS_AUDIO (FROM video, TO audio)

video can point to a tag that describes its content (i.e. finance, pets, sports)

CREATE UNDIRECTED EDGE VIDEO_HAS_TAG (FROM video, TO tag)

Loading Data + Setup

Now we will load the two downloaded CSVs. First click on “Map Data to Graph”. The files are already mapped but they still need to be uploaded.

Click “Add data file”

Upload the CSVs to the server and then click “Cancel” (they will still be added)

Then, proceed to the “Load Data” tab and press the start button

 

(You may have to wait a bit for this button to light up and be pressable)

Finally, proceed to the “Write Queries” tab and click on “Install All Queries”

 

Running Both Analytics Queries

The first query, algoLouvain, is responsible for community detection within the data. Every community is assigned a unique communityID, and this query fills every user’s communityID attribute based on the network of people that they follow. This will be used later to promote recommendations of videos that other users in your community have interacted with.

(algoLouvain is adapted from our Community Graph Algorithm Library)

The output of this query prints a map of all the community sizes and the corresponding IDs within each size group. You can run it by clicking on the play button icon in the header.

 

The query parameter only affects printed output, the backend ID assignment always works the same.

The main query is recommendVideo. It accepts 2 parameters: targetUsername (identifies which user to recommend videos to) and minWatchPercent (identifies the minimum watch percentage to count as a view).

targetUsername should be either leobenmattdon, or joe

The output will be a sorted collection of all the videos in the database, ordered by a per-video interest score that represents how applicable a video is to the target user’s current interests and community.

Output of calling the function with leo and 50% minimum watch time:

 {
 “attributes”: {
 “@matchingTagsAndAudios”: 2,
 “@score”: 5.5,
 “country”: “Canada”
 },
 “v_id”: “6”,
 “v_type”: “video”
 }, {
 “attributes”: {
 “@matchingTagsAndAudios”: 1,
 “@score”: 3.5,
 “country”: “USA”
 },
 “v_id”: “5”,
 “v_type”: “video”
 }, ... (lists all videos)

Note: if a video’s score is -1, it means the target user already watched it

Understanding the Main Query

Now that the functionality has been demonstrated, we will dive into a few ways in which these video interest scores are calculated through graphical relationships in our data.

First, we establish a singleton vertex set only containing the target user:

allUsers = {user.*};targetUser = SELECT s 
             FROM allUsers:s
             WHERE s.username == targetUsername
             ACCUM targetUserCountry = s.country,
                   targetUserCommunity = s.communityID;

Then, we develop something of a profile for this target user, where we store information about the tags and audios from their watched videos

SetAccum<STRING> @@targetUserTags;
SetAccum<INT> @@targetUserAudios;watchedVideos = SELECT t
                FROM targetUser:s - (watched:e) - video:t
                WHERE watched.percentWatched >= minWatchPercent;fillTags = SELECT t
           FROM watchedVideos:s - (VIDEO_HAS_TAG:e) - tag:t
           ACCUM @@targetUserTags += t.content;
    
fillAudio = SELECT t
            FROM watchedVideos:s - (VIDEO_HAS_AUDIO:e) - audio:t
            ACCUM @@targetUserAudios += t.content;

With this interest profile, we can start identifying other videos (containing the target tags and audios) that the user would probably also enjoy

// Per-vertex variables
SumAccum<FLOAT> @matchingTagsAndAudios;
SumAccum<FLOAT> @score;// Every matching tag/audio in a video adds 2 points
allVids = {video.*};giveScores = SELECT s
             FROM allVids:s - (VIDEO_HAS_TAG:e) - tag:t
             ACCUM IF @@targetUserTags.contains(t.content) THEN                  
                      [email protected] += 2,                                               
                      [email protected] += 1
                   END;
                 
giveScores = SELECT s
             FROM allVids:s - (VIDEO_HAS_AUDIO:e) - audio:t
             ACCUM IF @@targetUserAudios.contains(t.content) THEN 
                      [email protected] += 2,                                                           
                      [email protected] += 1                                              
                   END;// Bonus point if 2 or more tags/audios match the user's preference
giveBonusForSimilar = SELECT s
                      FROM allVids:s
                      WHERE [email protected] > 1
                      ACCUM [email protected] += 1;

The communityID generation by the previous query also comes into play.

Videos that have been liked/commented on by members of the target community receive bonus points

// Bump videos liked/commented on within a user's community
allUsers = {user.*};
communityUsers = SELECT s
                 FROM allUsers:s
                 WHERE s.communityID == targetUserCommunity;
    
communityLiked = SELECT t
                 FROM communityUsers:s - (watched:e) - video:t
                 WHERE watched.liked
                 ACCUM [email protected] += 0.5;
    
communityCommented = SELECT t
                     FROM communityUsers:s - (watched:e) - video:t
                     WHERE watched.commented
                     ACCUM [email protected] += 0.5;

Finally, we bump the scores of videos liked by users who liked a video in common with the target user. See the related diagram below:

likedVideos = SELECT t
              FROM targetUser:s - (watched:e) - video:t
              WHERE watched.liked;
    
relatedUsers = SELECT t
               FROM likedVideos:s - (watched:e) - user:t
               WHERE watched.liked;
    
relatedVideos = SELECT t
                FROM relatedUsers:s - (watched:e) - video:t
                WHERE watched.liked
                ACCUM [email protected] += 0.25;
 

Each arrow represents a hop

These are just a few of the relationships that can be analyzed to improve the catering of video content to individual users and their interests.

My exact point system is quite arbitrary and simplistic. It should be refined for real-world applications by observing its behavior and tendencies with larger sets of data.

If you have any questions, feel free to ask others in the community forum or also on Discord.

You Might Also Like