Here are th e vertices and interactions (edges ) that we will observe…
A user can follow another user
CREATE DIRECTED EDGE following (FROM user, TO user) A user can watch a video
CREATE UNDIRECTED EDGE watched (
FROM user,
TO video,
percentWatched INT,
commented BOOL,
liked BOOL) A video can contain identifiable audio (exists as an integer ID)
CREATE UNDIRECTED EDGE VIDEO_HAS_AUDIO (FROM video, TO audio) A video can point to a tag that describes its content (i.e. finance, pets, sports)
CREATE UNDIRECTED EDGE VIDEO_HAS_TAG (FROM video, TO tag) Loading Data + Setup Now we will load the two downloaded CSVs. First click on “Map Data to Graph”. The files are already mapped but they still need to be uploaded.
Click “Add data file”
Upload the CSVs to the server and then click “Cancel” (they will still be added)
Then, proceed to the “Load Data” tab and press the start button
(You may have to wait a bit for this button to light up and be pressable)
Finally, proceed to the “Write Queries” tab and click on “Install All Queries”
Running Both Analytics Queries The first query, algoLouvain , is responsible for community detection within the data. Every community is assigned a unique communityID , and this query fills every user’s communityID attribute based on the network of people that they follow. This will be used later to promote recommendations of videos that other users in your community have interacted with.
(algoLouvain is adapted from our Community Graph Algorithm Library )
The output of this query prints a map of all the community sizes and the corresponding IDs within each size group. You can run it by clicking on the play button icon in the header.
The query parameter only affects printed output, the backend ID assignment always works the same.
The main query is recommendVideo . It accepts 2 parameters: targetUsername (identifies which user to recommend videos to) and minWatchPercent (identifies the minimum watch percentage to count as a view).
targetUsername should be either leo , ben , matt , don , or joe
The output will be a sorted collection of all the videos in the database, ordered by a per-video interest score that represents how applicable a video is to the target user’s current interests and community.
Output of calling the function with leo and 50% minimum watch time:
{
“attributes”: {
“@matchingTagsAndAudios”: 2,
“@score”: 5.5,
“country”: “Canada”
},
“v_id”: “6”,
“v_type”: “video”
}, {
“attributes”: {
“@matchingTagsAndAudios”: 1,
“@score”: 3.5,
“country”: “USA”
},
“v_id”: “5”,
“v_type”: “video”
}, ... (lists all videos) Note: if a video’s score is -1, it means the target user already watched it
Understanding the Main Query Now that the functionality has been demonstrated, we will dive into a few ways in which these video interest scores are calculated through graphical relationships in our data.
First, we establish a singleton vertex set only containing the target user:
allUsers = {user.*}; targetUser = SELECT s
FROM allUsers:s
WHERE s.username == targetUsername
ACCUM targetUserCountry = s.country,
targetUserCommunity = s.communityID;Then, we develop something of a profile for this target user, where we store information about the tags and audios from their watched videos
SetAccum<STRING> @@targetUserTags;
SetAccum<INT> @@targetUserAudios; watchedVideos = SELECT t
FROM targetUser:s - (watched:e) - video:t
WHERE watched.percentWatched >= minWatchPercent; fillTags = SELECT t
FROM watchedVideos :s - (VIDEO_HAS_TAG:e) - tag:t
ACCUM @@targetUserTags += t.content;
fillAudio = SELECT t
FROM watchedVideos :s - (VIDEO_HAS_AUDIO:e) - audio:t
ACCUM @@targetUserAudios += t.content; With this interest profile, we can start identifying other videos (containing the target tags and audios) that the user would probably also enjoy
// Per-vertex variables
SumAccum<FLOAT> @matchingTagsAndAudios;
SumAccum<FLOAT> @score;// Every matching tag/audio in a video adds 2 points
allVids = {video.*};giveScores = SELECT s
FROM allVids:s - (VIDEO_HAS_TAG:e) - tag:t
ACCUM IF @@targetUserTags.contains(t.content) THEN
s.@score += 2,
s.@matchingTagsAndAudios += 1
END;
giveScores = SELECT s
FROM allVids:s - (VIDEO_HAS_AUDIO:e) - audio:t
ACCUM IF @@targetUserAudios.contains(t.content) THEN
s.@score += 2,
s.@matchingTagsAndAudios += 1
END; // Bonus point if 2 or more tags/audios match the user's preference
giveBonusForSimilar = SELECT s
FROM allVids:s
WHERE s.@matchingTagsAndAudios > 1
ACCUM s.@score += 1;The communityID generation by the previous query also comes into play.
Videos that have been liked/commented on by members of the target community receive bonus points
// Bump videos liked/commented on within a user's community
allUsers = {user.*};
communityUsers = SELECT s
FROM allUsers:s
WHERE s.communityID == targetUserCommunity;
communityLiked = SELECT t
FROM communityUsers:s - (watched:e) - video:t
WHERE watched.liked
ACCUM t.@score += 0.5;
communityCommented = SELECT t
FROM communityUsers:s - (watched:e) - video:t
WHERE watched.commented
ACCUM t.@score += 0.5;Finally, we bump the scores of videos liked by users who liked a video in common with the target user. See the related diagram below:
likedVideos = SELECT t
FROM targetUser:s - (watched:e) - video:t
WHERE watched.liked;
relatedUsers = SELECT t
FROM likedVideos:s - (watched:e) - user:t
WHERE watched.liked;
relatedVideos = SELECT t
FROM relatedUsers:s - (watched:e) - video:t
WHERE watched.liked
ACCUM t.@score += 0.25;
Each arrow represents a hop
These are just a few of the relationships that can be analyzed to improve the catering of video content to individual users and their interests.
My exact point system is quite arbitrary and simplistic. It should be refined for real-world applications by observing its behavior and tendencies with larger sets of data.
If you have any questions, feel free to ask others in the community forum or also on Discord .