Cayley: graphs in Go
June 25th, 2014 | Published in Google Open Source
Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google. It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day.
When I moved to New York last year, I saw just how far the concepts of Freebase and its data had spread through Google’s worldwide offices. I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with.
With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.
So today we are excited to release Cayley, an open source graph database.
Cayley is a spiritual successor to graphd; it shares a similar query strategy for speed. While not an exact replica of it’s predecessor, it brings it’s own features to the table:
• RESTful API
• Multiple (modular) backend stores, such as LevelDB and MongoDB
• Multiple (modular) query languages
• Easy to get started
• Simple to build on top of as a library
and of course
• Open Source
Cayley is written in Go, which was a natural choice. As a backend service that depends upon speed and concurrent access, Go seemed like a good fit. Go did not disappoint; with a fantastic standard library and easy access to open source libraries from the community, the necessary building blocks were already there. Combined with Go’s effective concurrency patterns compared to C, creating a performance-competitive successor to graphd became a reality.
To get a sense of Cayley, check out the I/O Bytes video we created where we “Build A Small Knowledge Graph”. The video includes a quick introduction to graph stores as well as an example of processing Freebase and Schema.org linked data.
You can also check out the demo dataset in a live instance running on Google App Engine. It’s running with the sample dataset in the repository — 30,000 movies and their actors, roles, and directors using Freebase film schema. For a more-than-trivial query, try running the following code, both as a query and as a visualization; what you’ll see is the neighborhood of the given actor and how the actors who co-star with that actor interact with each other:
costar =
g.M().In("/film/performance/actor").In("/film/film/starring")
function getCostars(x) {
return g.V(x).As("source").In("name")
.Follow(costar).FollowR(costar)
.Out("name").As("target")
}
function getActorNeighborhood(primary_actor) {
actors = getCostars(primary_actor).TagArray()
seen = {}
for (a in actors) {
g.Emit(actors[a])
seen[actors[a].target] = true
}
seen[primary_actor] = false
actor_list = []
for (actor in seen) {
if (seen[actor]) {
actor_list.push(actor)
}
}
getCostars(actor_list).Intersect(g.V(actor_list)).ForEach(function(d)
{
{
if (d.source
g.Emit(d)
}
})
}
getActorNeighborhood("Humphrey Bogart")
By Barak Michener, Software Engineer, Knowledge NYC