Posted by Corinna Cortes and Alfred Spector, Google Research
Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.
In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a set of papers on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a second round, we highlighted new noteworthy papers from the later half of 2010 and again in 2011. This time we honor the influential papers authored or co-authored by Googlers covering all of 2012 — covering roughly 6% of our total publications. It’s tough choosing, so we may have left out some important papers. So, do see the publications list to review the complete group.
In the coming weeks we will be offering a more in-depth look at some of these publications, but here are the summaries:
Algorithms and Theory
Online Matching with Stochastic Rewards
Aranyak Mehta*, Debmalya Panigrahi [FOCS'12]
Online advertising is inherently stochastic: value is realized only if the user clicks on the ad, while the ad platform knows only the probability of the click. This paper is the first to introduce the stochastic nature of the rewards to the rich algorithmic field of online allocations. The core algorithmic problem it formulates is online bipartite matching with stochastic rewards, with known click probabilities. The main result is an online algorithm which obtains a large fraction of the optimal value. The paper also shows the difficulty introduced by the stochastic nature, by showing how it behaves very differently from the classic (non-stochastic) online matching problem.
Matching with our Eyes Closed
Gagan Goel*, Pushkar Tripathi* [FOCS'12]
In this paper we propose a simple randomized algorithm for finding a matching in a large graph. Unlike most solutions to this problem, our approach does not rely on building large combinatorial structures (like blossoms) but works completely locally. We analyze the performance of our algorithm and show that it does significantly better than the greedy algorithm. In doing so we improve a celebrated 18 year old result by Aronson et. al.
Simultaneous Approximations for Adversarial and Stochastic Online Budgeted Allocation
Vahab Mirrokni*, Shayan Oveis Gharan, Morteza Zadimoghaddam, [SODA'12]
In this paper, we study online algorithms that simultaneously perform well in worst-case and average-case instances, or equivalently algorithms that perform well in both stochastic and adversarial models at the same time. This is motivated by online allocation of queries to advertisers with budget constraints. Stochastic models are not robust enough to deal with traffic spikes and adversarial models are too pessimistic. While several algorithms have been proposed for these problems, each algorithm was known to perform well in one model and not both, and we present new results for a single algorithm that works well in both models.
Economics and EC
Polyhedral Clinching Auctions and the Adwords Polytope
Gagan Goel*, Vahab Mirrokni*, Renato Paes Leme [STOC'12]
Budgets play a major role in ad auctions where advertisers explicitly declare budget constraints. Very little is known in auctions about satisfying such budget constraints while keeping incentive compatibility and efficiency. The problem becomes even harder in the presence of complex combinatorial constraints over the set of feasible allocations. We present a class of ascending-price auctions addressing this problem for a very general class of (polymatroid) allocation constraints including the AdWords problem with multiple keywords and multiple slots.
HCI
Backtracking Events as Indicators of Usability Problems in Creation-Oriented Applications
David Akers*, Robin Jeffries*, Matthew Simpson*, Terry Winograd [TOCHI '12]
Backtracking events such as undo can be useful automatic indicators of usability problems for creation-oriented applications such as word processors and photo editors. Our paper presents a new cost-effective usability evaluation method based on this insight.
Talking in Circles: Selective Sharing in Google+
Sanjay Kairam, Michael J. Brzozowski*, David Huffaker*, Ed H. Chi*, [CHI'12]
This paper explores why so many people share selectively on Google+: to protect their privacy but also to focus and target their audience. People use Circles to support these goals, organizing contacts by life facet, tie strength, and interest.
Information Retrieval
Online selection of diverse results
Debmalya Panigrahi, Atish Das Sarma, Gagan Aggarwal*, and Andrew Tomkins*, [WSDM'12]
We consider the problem of selecting subsets of items that are simultaneously diverse in multiple dimensions, which arises in the context of recommending interesting content to users. We formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm. We prove that the algorithm always produces a nearly optimal solution and also perform experiments on real-world data that indicate that the algorithm performs even better in practice than the analytical guarantees.
Machine Learning
Large Scale Distributed Deep Networks
Jeffrey Dean, Greg S. Corrado*, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng, NIPS 2012;
In this paper, we examine several techniques to improve the time to convergence for neural networks and other models trained by gradient-based methods. The paper describes a system we have built that exploits both model-level parallelism (by partitioning the nodes of a large model across multiple machines) and data-level parallelism (by having multiple replicas of a model process different training data and coordinating the application of updates to the model state through a centralized-but-partitioned parameter server system). Our results show that very large neural networks can be trained effectively and quickly on large clusters of machines.
Open Problem: Better Bounds for Online Logistic Regression
Brendan McMahan* and Matthew Streeter*, COLT/ICML’12 Joint Open Problem Session, JMLR: Workshop and Conference Proceedings.
One of the goals of research at Google is to help point out important open problems–precise questions that are interesting academically but also have important practical ramifications. This open problem is about logistic regression, a widely used algorithm for predicting probabilities (what is the probability an email message is spam, or that a search ad will be clicked). We show that in the simple one-dimensional case, much better results are possible than current theoretical analysis suggests, and we ask whether our results can be generalized to arbitrary logistic regression problems.
Spectral Learning of General Weighted Automata via Constrained Matrix Completion
Borja Balle and Mehryar Mohri*, NIPS 2012.
Learning weighted automata from finite samples drawn from an unknown distribution is a central problem in machine learning and computer science in general, with a variety of applications in text and speech processing, bioinformatics, and other areas. This paper presents a new family of algorithms for tackling this problem for which it proves learning guarantees. The algorithms introduced combine ideas from two different domains: matrix completion and spectral methods.
Machine Translation
Improved Domain Adaptation for Statistical Machine Translation
Wei Wang*, Klaus Macherey*, Wolfgang Macherey*, Franz Och* and Peng Xu*, [AMTA'12]
Research in domain adaptation for machine translation has been mostly focusing on one domain. We present a simple and effective domain adaptation infrastructure that makes an MT system with a single translation model capable of providing adapted, close-to-upper-bound domain-specific accuracy while preserving the generic translation accuracy. Large-scale experiments on 20 language pairs for patent and generic domains show the viability of our approach.
Multimedia and Computer Vision
Reconstructing the World’s Museums
Jianxiong Xiao and Yasutaka Furukawa*, [ECCV '12]
Virtual navigation and exploration of large indoor environments (e.g., museums) have been so far limited to either blueprint-style 2D maps that lack photo-realistic views of scenes, or ground-level image-to-image transitions, which are immersive but ill-suited for navigation. This paper presents a novel vision-based 3D reconstruction and visualization system to automatically produce clean and well-regularized texture-mapped 3D models for large indoor scenes, from ground-level photographs and 3D laser points. For the first time, we enable users to easily browse a large scale indoor environment from a bird’s-eye view, locate specific room interiors, fly into a place of interest, view immersive ground-level panoramas, and zoom out again, all with seamless 3D transitions.
The intervalgram: An audio feature for large-scale melody recognition
Thomas C. Walters*, David Ross*, Richard F. Lyon*, [CMMR'12]
Intervalgrams are small images that summarize the structure of short segments of music by looking at the musical intervals between the notes present in the music. We use them for finding cover songs – different pieces of music that share the same underlying composition. Wedo this by comparing ‘heatmaps’ which look at the similarity between intervalgrams from different pieces of music over time. If we see a strong diagonal line in the heatmap, it’s good evidence that the songs are musically similar.
General and Nested Wiberg Minimization
Dennis Strelow*, [CVPR'12]
Eriksson and van den Hengel’s CVPR 2010 paper showed that Wiberg’s least squares matrix factorization, which effectively eliminates one matrix from the factorization problem, could be applied to the harder case of L1 factorization. Our paper generalizes their approach beyond factorization to general nonlinear problems in two sets of variables, like perspective structure-from-motion. We also show that with our generalized method, one Wiberg minimization can also be nested inside another, effectively eliminating two of three sets of unknowns, and we demonstrated this idea using projective struture-from-motion
Calibration-Free Rolling Shutter Removal
Matthias Grundmann*, Vivek Kwatra*, Daniel Castro, Irfan Essa*, International Conference on Computational Photography ’12. Best paper.
Mobile phones and current generation DSLR’s, contain an electronic rolling shutter, capturing each frame one row of pixels at a time. Consequently, if the camera moves during capture, it will cause image distortions ranging from shear to wobbly distortions. We propose a calibration-free solution based on a novel parametric mixture model to correct these rolling shutter distortions in videos that enables real-time rolling shutter rectification as part of YouTube’s video stabilizer.
Natural Language Processing
Vine Pruning for Efficient Multi-Pass Dependency Parsing
Alexander Rush, Slav Petrov*, The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL ’12), Best Paper Award.
Being able to accurately analyze the grammatical structure of sentences is crucial for language understanding applications such as machine translation or question answering. In this paper we present a method that is up to 200 times faster than existing methods and enables the grammatical analysis of text in large-scale applications. The key idea is to perform the analysis in multiple coarse-to-fine passes, resolving easy ambiguities first and tackling the harder ones later on.
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
Oscar Tackstrom, Ryan McDonald*, Jakob Uszkoreit*, North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL ’12), Best Student Paper Award.
This paper studies how to build meaningful cross-lingual word clusters, i.e., clusters containing lexical items from two languages that are coherent along some abstract dimension. This is done by coupling distributional statistics learned from huge amounts of language specific data coupled with constraints generated from parallel corpora. The resulting clusters are used to improve the accuracy of multi-lingual syntactic parsing for languages without any training resources.
Networks
How to Split a Flow
Tzvika Hartman*, Avinatan Hassidim*, Haim Kaplan*, Danny Raz*, Michal Segalov*, [INFOCOM '12]
Decomposing a flow into a small number of paths is a very important task arises in various network optimization mechanisms. In this paper we develop an an approximation algorithm for this problem that has both provable worst case performance grantees as well as good practical behavior.
Deadline-Aware Datacenter TCP (D2TCP)
Balajee Vamanan, Jahangir Hasan*, T. N. Vijaykumar, [SIGCOMM '12]
Some of our most important products like search and ads operate under soft-real-time constraints. They are architected and fine-tuned to return results to users within a few hundred milliseconds. Deadline-Aware Datacenter TCP is a research effort into making the datacenter networks deadline aware, thus improving the performance of such key applications.
Trickle: Rate Limiting YouTube Video Streaming
Monia Ghobadi, Yuchung Cheng*, Ankur Jain*, Matt Mathis* [USENIX '12]
Trickle is a server-side mechanism to stream YouTube video smoothly to reduce burst and buffer-bloat. It paces the video stream by placing an upper bound on TCP’s congestion window based on the streaming rate and the round-trip time. In initial evaluation Trickle reduces the TCP loss rate by up to 43% and the RTT by up to 28%. Given the promising results we are deploying Trickle to all YouTube servers.
Social Systems
Look Who I Found: Understanding the Effects of Sharing Curated Friend Groups
Lujun Fang*, Alex Fabrikant*, Kristen LeFevre*, [Web Science '12]. Best Student Paper award.
In this paper, we studied the impact of the Google+ circle-sharing feature, which allows individual users to share (publicly and privately) pre-curated groups of friends and contacts. We specifically investigated the impact on the growth and structure of the Google+ social network. In the course of the analysis, we identified two natural categories of shared circles (“communities” and “celebrities”). We also observed that the circle-sharing feature is associated with the accelerated densification of community-type circles.
Software Engineering
AddressSanitizer: A Fast Address Sanity Checker
Konstantin Serebryany*, Derek Bruening*, Alexander Potapenko*, Dmitry Vyukov*, [USENIX ATC '12].
The paper “AddressSanitizer: A Fast Address Sanity Checker” describes a dynamic tool that finds memory corruption bugs in C or C++ programs with only a 2x slowdown. The major feature of AddressSanitizer is simplicity — this is why the tool is very fast.
Speech
Japanese and Korean Voice Search
Mike Schuster*, Kaisuke Nakajima*, IEEE International Conference on Acoustics, Speech, and Signal Processing [ICASSP'12].
“Japanese and Korean voice search” explains in detail how the Android voice search systems for these difficult languages were developed. We describe how to segment statistically to be able to handle infinite vocabularies without out-of-vocabulary words, how to handle the lack of spaces between words for language modeling and dictionary generation, and how to deal best with multiple ambiguities during evaluation scoring of reference transcriptions against hypotheses. The combination of techniques presented led to high quality speech recognition systems–as of 6/2013 Japanese and Korean are #2 and #3 in terms of traffic after the US.
Google’s Cross-Dialect Arabic Voice Search
Fadi Biadsy*, Pedro J. Moreno*, Martin Jansche*, IEEE International Conference on Acoustics, Speech, and Signal Processing [ICASSP 2012].
This paper describes Google’s automatic speech recognition systems for recognizing several Arabic dialects spoken in the Middle East, with the potential to reach more than 125 million users. We suggest solutions for challenges specific to Arabic, such as the diacritization problem, where short vowels are not written in Arabic text. We conduct experiments to identify the optimal manner in which acoustic data should be clustered among dialects.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey Hinton*, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew W. Senior*, Vincent Vanhoucke*, Patrick Nguyen, Tara Sainath, Brian Kingsbury, Signal Processing Magazine (2012)”
Survey paper on the DNN breakthrough in automatic speech recognition accuracy.
Statistics
Empowering Online Advertisements by Empowering Viewers with the Right to Choose
Max Pashkevich*, Sundar Dorai-Raj*, Melanie Kellar*, Dan Zigmond*, Journal of Advertising Research, vol. 52 (2012).
YouTube’s TrueView in-stream video advertising format (a form of skippable in-stream ads) can improve the online video viewing experience for users without sacrificing advertising value for advertisers or content owners.
Structured Data
Efficient Spatial Sampling of Large Geographical Tables
Anish Das Sarma*, Hongrae Lee*, Hector Gonzalez*, Jayant Madhavan*, Alon Halevy*, [SIGMOD '12].
This paper presents fundamental results for the “thinning problem”: determining appropriate samples of data to be shown on specific geographical regions and zoom levels. This problem is widely applicable for a number of cloud-based geographic visualization systems such as Google Maps, Fusion Tables, and the developed algorithms are part of the Fusion Tables backend. The SIGMOD 2012 paper was selected among the best papers of the conference, and invited to a special best-papers issue of TODS.
Systems
Spanner: Google’s Globally-Distributed Database
James C. Corbett*, Jeffrey Dean*, Michael Epstein*, Andrew Fikes*, Christopher Frost*, JJ Furman*, Sanjay Ghemawat*, Andrey Gubarev*, Christopher Heiser*, Peter Hochschild*, Wilson Hsieh*, Sebastian Kanthak*, Eugene Kogan*, Hongyi Li*, Alexander Lloyd*, Sergey Melnik*, David Mwaura*, David Nagle*, Sean Quinlan*, Rajesh Rao*, Lindsay Rolig*, Dale Woodford*, Yasushi Saito*, Christopher Taylor*, Michal Szymaniak*, Ruth Wang*, [OSDI '12]
This paper shows how a new time API and its implementation can provide the abstraction of tightly synchronized clocks, even on a global scale. We describe how we used this technology to build a globally-distributed database that supports a variety of powerful features: non-blocking reads in the past, lock-free snapshot transactions, and atomic schema changes.