This post is part of Who's at Google I/O, a series of guest blog posts written by developers who are appearing in the Developer Sandbox at Google I/O. It's also cross-posted to the Google Code blog which has similar posts for all sorts of Google developer products.
Mojo Helpdesk from Metadot is an RDBMS-based Rails application for ticket tracking and management that can handle millions of tickets. We are migrating this application to run on Google App Engine (GAE), Java, and Google Web Toolkit (GWT). We were motivated to make this move because of the application’s need for scalability in data management and request handling, the benefits from access to GAE’s services and administrative tools, and GWT’s support for easy development of a rich application front-end.
In this post, we focus on GAE and share some techniques that have been useful in the migration process.
Task failure management
Our application makes heavy use of the Task Queue service, and must detect and manage tasks that are being retried multiple times but aren’t succeeding. To do this, we extended
Deferred, which allows easy task definition and deployment. We defined a new
Task abstraction, which implements an extended
Deferrable and requires that every Task implement an
onFailure method. Our extension of
Deferred then terminates a Task permanently if it exceeds a threshold on retries, and calls its
This allows permanent task failure to be reliably exposed as an application-level event, and handled appropriately. (Similar techniques could be used to extend the new official Deferred API).
Mojo Helpdesk needs to run many types of batch jobs, and
appengine-mapreduce is of great utility. However, we often want to map over a filtered subset of Datastore entities, and our map implementations are JDO-based (to enforce consistent application semantics), so we don’t need low-level Entities prefetched.
So, we made two extensions to the mapper libraries. First, we support the specification of filters on the mapper’s Datastore sharding and fetch queries, so that a job need not iterate over all the entities of a Kind. Second, our mapper fetch does a keys-only Datastore query; only the keys are provided to the map method, then the full data objects are obtained via JDO. These changes let us run large JDO-based mapreduce jobs with much greater efficiency.
Supporting transaction semantics
The Datastore supports transactions only on entities in the same entity group. Often, operations on multiple entities must be performed atomically, but grouping is infeasible due to the contention that would result. We make heavy use of transactional tasks to circumvent this restriction. (If a task is launched within a transaction, it will be run if and only if the transaction commits). A group of activities performed in this manner – the initiating method and its transactional tasks – can be viewed as a “transactional unit” with shared semantics.
We have made this concept explicit by creating a framework to support definition, invocation, and automatic logging of transactional units. (The
Task abstraction above is used to identify cases where a transactional task does not succeed). All Datastore-related application actions – both in RPC methods and "offline" activities like mapreduce – use this framework. This approach has helped to make our application robust, by enforcing application-wide consistency in transaction semantics, and in the process, standardizing the events and logging which feed the app’s workflow systems.
To support join-like functionality, we can exploit multi-valued Entity properties (list properties) and the query support they provide. For example, a
Ticket includes a list of associated
Tag IDs, and
Tag objects include a list of
Ticket IDs they’re used with. This lets us very efficiently fetch, for example, all
Tickets tagged with a conjunction of keywords, or any Tags that a set of tickets has in common. (We have found the use of "index entities" to be effective in this context). We also store derived counts and categorizations in order to sidestep Datastore restrictions on query formulation.
These patterns have helped us build an app whose components run efficiently and robustly, interacting in a loosely coupled manner.
Amy (@amygdala) has recently co-authored (with Daniel Guermeur) a book on Google App Engine and GWT application development. She has worked at several startups, in academia, and in industrial R&D labs; consults and does technical training and course development in web technologies; and is a contributor to the @thinkupapp open source project.
Posted by Scott Knaster, Editor