October 9th, 2012 | Published in Google App Engine
Cross-posted with the Google Developers Blog
Today’s guest blogger is Aleem Mawani, co-founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator. Streak is a CRM tool built into Gmail. Aleem shares his experience building and scaling their product using Google Cloud Platform.
Everyone relies on email to get work done – yet most people use separate applications from their email to help them with various business processes. Streak fixes this problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal flow, project management and almost any other business process right inside Gmail. In this post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably and with the ability to deeply analyze our data.
We use several Google technologies on the backend of Streak:
- App Engine to serve our app
- App Engine Datastore to persist user data
- Memcache to make operations fast
- BigQuery to analyze our logs and power dashboards
- App Engine Search API to let users sift through their data
- Prediction API to machine learn over user data
- Google Translate API to translate our app to over 40 languages.
Our core learning is that you should use the best tool for the job. No one technology will be able to solve all your data storage and access needs. Instead, for each type of functionality, you should use a different service. In our case, we aggressively mirror our data in all the services mentioned above. For example, although the source of truth for our user data is in the App Engine Datastore, we mirror that data in the App Engine Search API so that we can provide full text search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can power internal dashboards.
App Engine - We use App Engine for Java primarily to serve our application to the browser and mobile clients in addition to serving our API. App Engine is the source of truth for all our data, so we aggressively cache using Memcache. We also use Objectify to simplify access to the Datastore, which I highly recommend.
Google Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud Storage, which acts as a conduit to other Google cloud services. It lets us archive the data as well as push it to BigQuery and the Prediction API.
BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can help generate useful business metrics and slice user data to better understand how our product is getting used. Not only can we run complex queries over our Datastore data but also over all of our log data. This is incredibly powerful for analyzing the request patterns to App Engine. We can answer questions like:
- Which requests cost us the most money?
- What is the average response time for every URL on our site over the last 3 days?
BigQuery helps us monitor error rates in our application. We process all of our log data with debug statements, as well as something called an “error type” for any request that fails. If it’s a known error, we'll log something sensible, and we log the exception type if we haven’t seen it before. This is beneficial because we built a dashboard that queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever we do a release, we can monitor error rates in the application really easily.
|A Streak dashboard powered by BigQuery showing current usage statistics|
Google Cloud Platform also makes our application better for our users. We take advantage of the App Engine Search API and again mirror our data there. Users can then query their Streak data using the familiar Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push our data to the Prediction API, we can help users throughout our app by making smart suggestions. In Streak, we train models based on which emails users have categorized into different projects. Then, when users get a new email, we can suggest the most likely box that the email belongs to.
One issue that arises is how to keep all these mirrored data sets in sync. It works differently for each service based on the architecture of the service. Here’s a simple breakdown:
Having these technologies easily available to us has been a huge help for Streak. It makes our products better and helps us understand our users. Streak’s user base grew 30% every week for 4 consecutive months after launch, and we couldn’t have scaled this easily without Google Cloud Platform. To read more details on why Cloud Platform makes sense for our business, check out our case study and our post on the Google Enterprise blog.
-Contributed by Aleem Mawani, co-founder of Streak