- Part 9

About: Research Blog

Profile:

Website

https://plus.google.com/101673966767287570260

Contact:

Posts by Research Blog

New ways to add Reminders in Inbox by Gmail

Jun 17, 2015

Posted by Dave Orr, Google Research Product Manager

Last week, Inbox by Gmail opened up and improved many of your favorite features, including two new ways to add Reminders.

First up, when someone emails you a to-do, Inbox can now suggest adding a Reminder so you don’t forget. Here’s how it looks if your spouse emails you and asks you to buy milk on the way home:

To help you add Reminders, the Google Research team used natural language understanding technology to teach Inbox to recognize to-dos in email.

And much like Gmail and Inbox get better when you report spam, your feedback helps improve these suggested Reminders. You can accept or reject them with a single click:

The other new way to add Reminders in Inbox is to create Reminders in Google Keep–they will appear in Inbox with a link back to the full note in Google Keep.

Hopefully, this little extra help gets you back to what matters more quickly and easily. Try the new features out, and as always, let us know what you think using the feedback link in the app.

Read | No Comments | Tags: Google Research

Google Computer Vision research at CVPR 2015

Jun 7, 2015

Posted by Vincent Vanhoucke, Google Research Scientist

Much of the world’s data is in the form of visual media. In order to utilize meaningful information from multimedia and deliver innovative products, such as Google Photos, Google builds machine-learning systems that are designed to enable computer perception of visual input, in addition to pursuing image and video analysis techniques focused on image/scene reconstruction and understanding.

This week, Boston hosts the 2015 Conference on Computer Vision and Pattern Recognition (CVPR 2015), the premier annual computer vision event comprising the main CVPR conference and several co-located workshops and short courses. As a leader in computer vision research, Google will have a strong presence at CVPR 2015, with many Googlers presenting publications in addition to hosting workshops and tutorials on topics covering image/video annotation and enhancement, 3D analysis and processing, development of semantic similarity measures for visual objects, synthesis of meaningful composites for visualization/browsing of large image/video collections and more.

Learn more about some of our research in the list below (Googlers highlighted in blue). If you are attending CVPR this year, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. Members of the Jump team will also have a prototype of the camera on display and will be showing videos produced using the Jump system on Google Cardboard.

Tutorials:
Applied Deep Learning for Computer Vision with Torch
Koray Kavukcuoglu, Ronan Collobert, Soumith Chintala

DIY Deep Learning: a Hands-On Tutorial with Caffe
Evan Shelhamer, Jeff Donahue, Yangqing Jia, Jonathan Long, Ross Girshick

ImageNet Large Scale Visual Recognition Challenge Tutorial
Olga Russakovsky, Jonathan Krause, Karen Simonyan, Yangqing Jia, Jia Deng, Alex Berg, Fei-Fei Li

Fast Image Processing With Halide
Jonathan Ragan-Kelley, Andrew Adams, Fredo Durand

Open Source Structure-from-Motion
Matt Leotta, Sameer Agarwal, Frank Dellaert, Pierre Moulon, Vincent Rabaud

Oral Sessions:
Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
George Papandreou, Iasonas Kokkinos, Pierre-André Savalle

Going Deeper with Convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
Richard A. Newcombe, Dieter Fox, Steven M. Seitz

Show and Tell: A Neural Image Caption Generator
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell

Visual Vibrometry: Estimating Material Properties from Small Motion in Video
Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Frédo Durand, William T. Freeman

Fast Bilateral-Space Stereo for Synthetic Defocus
Jonathan T. Barron, Andrew Adams, YiChang Shih, Carlos Hernández

Poster Sessions:
Learning Semantic Relationships for Better Action Retrieval in Images
Vignesh Ramanathan, Congcong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang Song, Samy Bengio, Charles Rosenberg, Li Fei-Fei

FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko, James Philbin

A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions
Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, Andrew C. Gallagher

Best-Buddies Similarity for Robust Template Matching
Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, William T. Freeman

Articulated Motion Discovery Using Pairs of Trajectories
Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari

Reflection Removal Using Ghosting Cues
YiChang Shih, Dilip Krishnan, Frédo Durand, William T. Freeman

P3.5P: Pose Estimation with Unknown Focal Length
Changchang Wu

MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, Alexander C. Berg

Inferring 3D Layout of Building Facades from a Single Image
Jiyan Pan, Martial Hebert, Takeo Kanade

The Aperture Problem for Refractive Motion
Tianfan Xue, Hossein Mobahei, Frédo Durand, William T. Freeman

Video Magnification in Presence of Large Motions
Mohamed Elgharib, Mohamed Hefeeda, Frédo Durand, William T. Freeman

Robust Video Segment Proposals with Painless Occlusion Handling
Zhengyang Wu, Fuxin Li, Rahul Sukthankar, James M. Rehg

Ontological Supervision for Fine Grained Classification of Street View Storefronts
Yair Movshovitz-Attias, Qian Yu, Martin C. Stumpe, Vinay Shet, Sacha Arnoud, Liron Yatziv

VIP: Finding Important People in Images
Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra

Fusing Subcategory Probabilities for Texture Classification
Yang Song, Weidong Cai, Qing Li, Fan Zhang

Beyond Short Snippets: Deep Networks for Video Classification
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

Workshops:
THUMOS Challenge 2015
Program organizers include: Alexander Gorban, Rahul Sukthankar

DeepVision: Deep Learning in Computer Vision 2015
Invited Speaker: Rahul Sukthankar

Large Scale Visual Commerce (LSVisCom)
Panelist: Luc Vincent

Large-Scale Video Search and Mining (LSVSM)
Invited Speaker and Panelist: Rahul Sukthankar
Program Committee includes: Apostol Natsev

Vision meets Cognition: Functionality, Physics, Intentionality and Causality
Program Organizers include: Peter Battaglia

Big Data Meets Computer Vision: 3rd International Workshop on Large Scale Visual Recognition and Retrieval (BigVision 2015)
Program Organizers include: Samy Bengio
Includes speaker Christian Szegedy – “Scalable approaches for large scale vision”

Observing and Understanding Hands in Action (Hands 2015)
Program Committee includes: Murphy Stein

Fine-Grained Visual Categorization (FGVC3)
Program Organizers include: Anelia Angelova

Large-scale Scene Understanding Challenge (LSUN)
Winners of the Scene Classification Challenge: Julian Ibarz, Christian Szegedy and Vincent Vanhoucke
Winners of the Caption Generation Challenge: Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan

Looking from above: when Earth observation meets vision (EARTHVISION)
Technical Committee includes: Andreas Wendel

Computer Vision in Vehicle Technology: Assisted Driving, Exploration Rovers, Aerial and Underwater Vehicles
Invited Speaker: Andreas Wendel
Program Committee includes: Andreas Wendel

Women in Computer Vision (WiCV)
Invited Speaker: Mei Han

ChaLearn Looking at People (sponsor)

Fine-Grained Visual Categorization (FGVC3) (sponsor)

Read | No Comments | Tags: Google Research

Announcing the 2015 Google European Doctoral Fellows

Jun 5, 2015

Posted by David Harper, University Relations and Beate List, Research Programs

In 2009, Google created the PhD Fellowship program to recognize and support outstanding graduate students doing exceptional work in Computer Science and related disciplines. The following year, we launched the program in Europe as the Google European Doctoral Fellowship program. Alumni of the European program occupy a variety of positions including faculty positions (Ofer Meshi, Cynthia Liem), academic research positions (Roland Angst, Carola Doerr née Winzen) and positions in industry (Yair Adato, Peter Hosek, Neil Houlsby).

Reflecting our continuing commitment to building strong relations with the European academic community, we are delighted to announce the 2015 Google European Doctoral Fellows. The following fifteen fellowship recipients were selected from an outstanding set of PhD students nominated by our partner universities:

Heike Adel, Fellowship in Natural Language Processing (University of Munich)
Thang Bui, Fellowship in Speech Technology (University of Cambridge)
Victoria Caparrós Cabezas, Fellowship in Distributed Systems (ETH Zurich)
Nadav Cohen, Fellowship in Machine Learning (The Hebrew University of Jerusalem)
Josip Djolonga, Fellowship in Probabilistic Inference (ETH Zurich)
Jakob Julian Engel, Fellowship in Computer Vision (Technische Universität München)
Nikola Gvozdiev, Fellowship in Computer Networking (University College London)
Felix Hill, Fellowship in Language Understanding (University of Cambridge)
Durk Kingma, Fellowship in Deep Learning (University of Amsterdam)
Massimo Nicosia, Fellowship in Statistical Natural Language Processing (University of Trento)
George Prekas, Fellowship in Operating Systems (École Polytechnique Fédérale de Lausanne)
Roman Prutkin, Fellowship in Graph Algorithms (Karlsruhe Institute of Technology)
Siva Reddy, Fellowship in Multilingual Semantic Parsing (The University of Edinburgh)
Immanuel Trummer, Fellowship in Structured Data Analysis (École Polytechnique Fédérale de Lausanne)
Margarita Vald, Fellowship in Security (Tel Aviv University)

This group of students represent the next generation of researchers who will endeavor to solve some of the most interesting challenges in Computer Science. We offer our congratulations, and look forward to their future contributions to the research community with high expectation.

Read | No Comments | Tags: Google Research

A Multilingual Corpus of Automatically Extracted Relations from Wikipedia

Jun 2, 2015

Posted by Shankar Kumar, Google Research Scientist and Manaal Faruqui, Carnegie Mellon University PhD candidate

In Natural Language Processing, relation extraction is the task of assigning a semantic relationship between a pair of arguments. As an example, a relationship between the phrases “Ottawa” and “Canada” is “is the capital of”. These extracted relations could be used in a variety of applications ranging from Question Answering to building databases from unstructured text.

While relation extraction systems work accurately for English and a few other languages, where tools for syntactic analysis such as parsers, part-of-speech taggers and named entity analyzers are readily available, there is relatively little work in developing such systems for most of the world’s languages where linguistic analysis tools do not yet exist. Fortunately, because we do have translation systems between English and many other languages (such as Google Translate), we can translate text from a non-English language to English, perform relation extraction and project these relations back to the foreign language.

Relation extraction in a Spanish sentence using the cross-lingual relation extraction pipeline.

In Multilingual Open Relation Extraction Using Cross-lingual Projection, that will appear at the 2015 Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL HLT 2015), we use this idea of cross-lingual projection to develop an algorithm that extracts open-domain relation tuples, i.e. where an arbitrary phrase can describe the relation between the arguments, in multiple languages from Wikipedia. In this work, we also evaluated the performance of extracted relations using human annotations in French, Hindi and Russian.

Since there is no such publicly available corpus of multilingual relations, we are releasing a dataset of automatically extracted relations from the Wikipedia corpus in 61 languages, along with the manually annotated relations in 3 languages (French, Hindi and Russian). It is our hope that our data will help researchers working on natural language processing and encourage novel applications in a wide variety of languages. More details on the corpus and the file formats can be found in this README file.

We wish to thank Bruno Cartoni, Vitaly Nikolaev, Hidetoshi Shimokawa, Kishore Papineni, John Giannandrea and their teams for making this data release possible. This dataset is licensed by Google Inc. under the Creative Commons Attribution-ShareAlike 3.0 License.

Read | No Comments | Tags: Google Research

Sergey and Larry awarded the Seoul Test-of-Time Award from WWW 2015

May 22, 2015

Posted by Andrei Broder, Google Distinguished Scientist

Today, at the 24th International World Wide Web Conference (WWW) in Florence, Italy, our company founders, Sergey Brin and Larry Page, received the inaugural Seoul Test-of-Time Award for their 1998 paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, which introduced Google to the world at the 7th WWW conference in Brisbane, Australia. I had the pleasure and honor to accept the award on behalf of Larry and Sergey from Professor Chin-Wan Chung, who led the committee that created the award.

Except for the fact that I was myself in Brisbane, it is hard to believe that Google began just as a two-student research project at Stanford University 17 years ago with the goal to “produce much more satisfying search results than existing systems.” Their paper presented two breakthrough concepts: first, using a distributed system built on inexpensive commodity hardware to deal with the size of the index, and second, using the hyperlink structure of the Web as a powerful new relevance signal. By now these ideas are common wisdom, but their paper continues to be very influential: it has over 13,000 citations so far and more are added every day.

Since those beginnings Google has continued to grow, with tools that enable small business owners to reach customers, help long lost friends to reunite, and empower users to discover answers. We keep pursuing new ideas and products, generating discoveries that both affect the world and advance the state-of-the-art in Computer Science and related disciplines. From products like Gmail, Google Maps and Google Earth Engine to advances in Machine Intelligence, Computer Vision, and Natural Language Understanding, it is our continuing goal to create useful tools and services that benefit our users.

Larry and Sergey sent a video message to the conference expressing their thanks and their encouragement for future research, in which Sergey said “There is still a ton of work left to do in Search, and on the Web as a whole and I couldn’t think of a more exciting time to be working in this space.” I certainly share this view, and was very gratified by the number of young computer scientists from all over the world that came by the Google booth at the conference to share their thoughts about the future of search, and to explore the possibility of joining our efforts.

Read | No Comments | Tags: Google Research

Tone: An experimental Chrome extension for instant sharing over audio

May 19, 2015

Posted by Alex Kauffmann, Interaction Researcher, and Boris Smus, Software Engineer

Sometimes in the course of exploring new ideas, we’ll stumble upon a technology application that gets us excited. Tone is a perfect example: it’s a Chrome extension that broadcasts the URL of the current tab to any machine within earshot that also has the extension installed. Tone is an experiment that we’ve enjoyed and found useful, and we think you may as well.

As digital devices have multiplied, so has the complexity of coordinating them and moving stuff between them. Tone grew out of the idea that while digital communication methods like email and chat have made it infinitely easier, cheaper, and faster to share things with people across the globe, they’ve actually made it more complicated to share things with the people standing right next to you. Tone aims to make sharing digital things with nearby people as easy as talking to them.

The first version was built in an afternoon for fun (which resulted in numerous rickrolls), but we increasingly found ourselves using it to share documents with everyone in a meeting quickly, to exchange design files back and forth while collaborating on UI design, and to contribute relevant links without interrupting conversations.

Tone provides an easy-to-understand broadcast mechanism that behaves like the human voice—it doesn’t pass through walls like radio or require pairing or addressing. The initial prototype used an efficient audio transmission scheme that sounded terrible, so we played it beyond the range of human hearing. However, because many laptop microphones and nearly all video conferencing systems are optimized for voice, it improved reliability considerably to also include a minimal DTMF-based audible codec. The combination is reliable for short distances in the majority of audio environments even at low volumes, and it even works over Hangouts.

Because it’s audio based, Tone behaves like speech in interesting ways. The orientation of laptops relative to each other, the acoustic characteristics of the space, the particular speaker volume and mic sensitivity, and even where you’re standing will all affect Tone’s reliability. Not every nearby machine will always receive every broadcast, just like not everyone will always hear every word someone says. But resending is painless and debugging generally just requires raising the volume. Many groups at Google have found that the tradeoffs between ease and reliability worthwhile—it is our hope that small teams, students in classrooms, and families with multiple computers will too.

To get started, first install the Tone extension for Chrome. Then simply open a tab with the URL you want to share, make sure your volume is on, and press the Tone button. Your machine will then emit a short sequence of beeps. Nearby machines receive a clickable notification that will open the same tab. Getting everyone on the same page has never been so easy!

Read | No Comments | Tags: Google Research

Paper to Digital in 200+ languages

May 6, 2015

Posted by Dmitriy Genzel and Ashok Popat, Research Scientists and Dhyanesh Narayanan, Product Manager

Many of the world’s important sources of information – books, newspapers, magazines, pamphlets, and historical documents – are not digital. Unlike digital documents, these paper-based sources of information are difficult to search through or edit, or worse, completely inaccessible to some people. Part of the solution is scanning, getting a digital image of the page, but raw image pixels aren’t yet recognized as textual content from the computer’s point of view.

Optical Character Recognition (OCR) technology aims to turn pictures of text into computer text that can be indexed, searched, and edited. For some time, Google Drive has provided OCR capabilities. Recently, we expanded this state-of-the-art technology to support all of the world’s major languages – that’s over 200 languages in more than 25 writing systems. This technology is available to users in 2 easy steps:

1. Upload a scanned document in its current form (say, as an image or PDF). The example below shows a scanned document in Hindi uploaded to a user’s Drive account as a PNG.

2. Right-click on the document in the Drive interface, and select ‘Open with’ -> ‘Google Docs’.

This opens a Google document with the original image followed by the extracted text.

You don’t even need to specify which language the document is in; the system will determine that automatically. Or, you can use the Google Drive API for more explicit control over the language detection in documents. For example, here is an invocation of the Drive API in Python:

The OCR capability in Drive is also available in the Drive App for Android.

To make this possible, engineering teams across Google pursued an approach to OCR focused on broad language coverage, with a goal of designing an architecture that could potentially work with all existing languages and writing systems. We do this in part by using Hidden Markov Models (HMMs) to make sense of the input as a whole sequence, rather than first trying to break it apart into pieces. This is similar to how modern speech recognition systems recognize audio input.

OCR and speech recognition share some challenges – like dealing with background “noise,” different languages, and low-quality inputs. But some challenges are specific to OCR: the variety of typefaces, the different types of scanners and cameras, and the need to work on older material that may contain archaic orthographic and linguistic elements. In addition to utilizing HMMs, we leveraged many of the same technologies used in the Google Handwriting Input app to allow automatic learning of features and to give preference to more likely output, as well as minimum-error-rate training to allow effective combination of multiple sources of information, and modern methods in machine learning to minimize manual design and maximize use of data. We also take advantage of advances in internationalization and typesetting, by using synthetic data in our training.

Currently, the OCR works best on cleanly scanned, high-resolution documents in the most commonly used typefaces. We are working to improve performance on poor quality scans and challenging text layouts. Give it a try and let us know how it works for you.

Read | No Comments | Tags: Google Research

Google Handwriting Input in 82 languages on your Android mobile device

Apr 15, 2015

Posted by Thomas Deselaers, Daniel Keysers, Henry Rowley, Li-Lun Wang, Victor Cărbune, Ashok Popat, Dhyanesh Narayanan, Handwriting Team, Google Research

Entering text on mobile devices is still considered inconvenient by many; touchscreen keyboards, although much improved over the years, require a lot of attention to hit the right buttons. Voice input is an option, but there are situations where it is not feasible, such as in a noisy environment or during a meeting. Using handwriting as an input method can allow for natural and intuitive input method for text entry which complements typing and speech input methods. However, until recently there have been many languages where enabling this functionality presented significant challenges.

Today we launched Google Handwriting Input, which lets users handwrite text on their Android mobile device as an additional input method for any Android app. Google Handwriting Input supports 82 languages in 20 distinct scripts, and works with both printed and cursive writing input with or without a stylus. Beyond text input, it also provides a fun way to enter hundreds of emojis by drawing them (simply press and hold the ‘enter’ button to switch modes). Google Handwriting Input works with or without an Internet connection.

By building on large-scale language modeling, robust multi-language OCR, and incorporating large-scale neural-networks and approximate nearest neighbor search for character classification, Google Handwriting Input supports languages that can be challenging to type on a virtual keyboard. For example, keyboards for ideographic languages (such as Chinese) are often based on a particular dialect of the language, but if a user does not know that dialect, they may be hard to use. Additionally, keyboards for complex script languages (like many South Asian languages) are less standardized and may be unfamiliar. Even for languages where virtual keyboards are more widely used (like English or Spanish), some users find that handwriting is more intuitive, faster, and generally more comfortable.

Writing ‘Hello’ in Chinese, German, and Tamil.

Google Handwriting Input is the result of many years of research at Google. Initially, cloud based handwriting recognition supported the Translate Apps on Android and iOS, Mobile Search, and Google Input Tools (in Chrome, ChromeOS, Gmail and Docs, translate.google.com, and the Docs symbol picker). However, other products required recognizers to run directly on an Android device without an Internet connection. So we worked to make recognition models smaller and faster for use in Android handwriting input methods for Simplified and Traditional Chinese, Cantonese, and Hindi, as well as multi-language support in Gesture Search. Google Handwriting Input combines these efforts, allowing recognition both on-device and in the cloud (by tapping on the cloud icon) in any Android app.

You can install Google Handwriting Input from the Play Store here. More information and FAQs can be found here.

Read | No Comments | Tags: Google Research

Beyond Short Snippets: Deep Networks for Video Classification

Apr 8, 2015

Posted by Software Engineers George Toderici and Sudheendra Vijayanarasimhan

Convolutional Neural Networks (CNNs) have recently shown rapid progress in advancing the state of the art of detecting and classifying objects in static images, automatically learning complex features in pictures without the need for manually annotated features. But what if one wanted not only to identify objects in static images, but also analyze what a video is about? After all, a video isn’t much more than a string of static images linked together in time.

As it turns out, video analysis provides even more information to the object detection and recognition task performed by CNN’s by adding a temporal component through which motion and other information can be also be used to improve classification. However, analyzing entire videos is challenging from a modeling perspective because one must model variable length videos with a fixed number of parameters. Not to mention that modeling variable length videos is computationally very intensive.

In Beyond Short Snippets: Deep Networks for Video Classification, to be presented at the 2015 Computer Vision and Pattern Recognition conference (CVPR 2015), we¹ evaluated two approaches – feature pooling networks and recurrent neural networks (RNNs) – capable of modeling variable length videos with a fixed number of parameters while maintaining a low computational footprint. In doing so, we were able to not only show that learning a high level global description of the video’s temporal evolution is very important for accurate video classification, but that our best networks exhibited significant performance improvements over previously published results on the Sports 1 million dataset (Sports-1M).

In previous work, we employed 3D-convolutions (meaning convolutions over time and space) over short video clips – typically just a few seconds – to learn motion features from raw frames implicitly and then aggregate predictions at the video level. For purposes of video classification, the low level motion features were only marginally outperforming models in which no motion was modeled.

To understand why, consider the following two images which are very similar visually but obtain drastically different scores from a CNN model trained on static images:

Slight differences in object poses/context can change the predicted class/confidence of CNNs trained on static images.

Since each individual video frame forms only a small part of the video’s story, static frames and short video snippets (2-3 secs) use incomplete information and could easily confuse subtle fine-grained distinctions between classes (e.g: Tae Kwon Do vs. Systema) or use portions of the video irrelevant to the action of interest.

To get around this frame-by-frame confusion, we used feature pooling networks that independently process each frame and then pool/aggregate the frame-level features over the entire video at various stages. Another approach we took was to utilize an RNN (derived from Long Short Term Memory units) instead of feature pooling, allowing the network itself to decide which parts of the video are important for classification. By sharing parameters through time, both feature pooling and RNN architectures are able to maintain a constant number of parameters while capturing a global description of the video’s temporal evolution.

In order to feed the two aggregation approaches, we compute an image “pixel-based” CNN model, based on the raw pixels in the frames of a video. We processed videos for the “pixel-based” CNNs at one frame per second to reduce computational complexity. Of course, at this frame rate implicit motion information is lost.

To compensate, we incorporate explicit motion information in the form of optical flow – the apparent motion of objects across a camera’s viewfinder due to the motion of the objects or the motion of the camera. We compute optical flow images over adjacent frames to learn an additional “optical flow” CNN model.

Left: Image used for the pixel-based CNN; Right: Dense optical flow image used for optical flow CNN

The pixel-based and optical flow based CNN model outputs are provided as inputs to both the RNN and pooling approaches described earlier. These two approaches then separately aggregate the frame-level predictions from each CNN model input, and average the results. This allows our video-level prediction to take advantage of both image information and motion information to accurately label videos of similar activities even when the visual content of those videos varies greatly.

Badminton (top 25 videos according to the max-pooling model). Our methods accurately label all 25 videos as badminton despite the variety of scenes in the various videos because they use the entire video’s context for prediction.

We conclude by observing that although very different in concept, the max-pooling and the recurrent neural network methods perform similarly when using both images and optical flow. Currently, these two architectures are the top performers on the Sports-1M dataset. The main difference between the two was that the RNN approach was more robust when using optical flow alone on this dataset. Check out a short video showing some example outputs from the deep convolutional networks presented in our paper.

1 Research carried out in collaboration with University of Maryland, College Park PhD student Joe Yue-Hei Ng and University of Texas at Austin PhD student Matthew Hausknecht, as part of a Google Software Engineering Internship^↩

Read | No Comments | Tags: Google Research

Skill maps, analytics and more with Google’s Course Builder 1.8

Apr 6, 2015

Posted by Adam Feldman, Product Manager and Pavel Simakov, Technical Lead, Course Builder Team

Over the past couple of years, Google’s Course Builder has been used to create and deliver hundreds of online courses on a variety of subjects (from sustainable energy to comic books), making learning more scalable and accessible through open source technology. With the help of Course Builder, over a million students of all ages have learned something new.

Today, we’re increasing our commitment to Course Builder by bringing rich, new functionality to the platform with a new release. Of course, we will also continue to work with edX and others to contribute to the entire ecosystem.

This new version enables instructors and students to understand prerequisites and skills explicitly, introduces several improvements to the instructor experience, and even allows you to export data to Google BigQuery for in depth analysis.

Drag and drop, simplified tabs, and student feedback

We’ve made major enhancements to the instructor interface, such as simplifying the tabs and clarifying which part of the page you’re editing, so you can spend more time teaching and less time configuring. You can also structure your course on the fly by dragging and dropping elements directly in the outline.

Additionally, we’ve added the option to include a feedback box at the bottom of each lesson, making it easy for your students to tell you their thoughts (though we can’t promise you’ll always enjoy reading them).

Skill Mapping

You can now define prerequisites and skills learned for each lesson. For instance, in a course about arithmetic, addition might be a prerequisite for the lesson on multiplying numbers, while multiplication is a skill learned. Once an instructor has defined the skill relationships, they will have a consolidated view of all their skills and the lessons they appear in, such as this list for Power Searching with Google:

Instructors can then enable a skills widget that shows at the top of each lesson and which lets students see exactly what they should know before and after completing a lesson. Below are the prerequisites and goals for the Thinking More Deeply About Your Search lesson. A student can easily see what they should know beforehand and which lessons to explore next to learn more.

Skill maps help a student better understand which content is right for them. And, they lay the groundwork for our future forays into adaptive and personalized learning. Learn more about Course Builder skill maps in this video.

Analytics through BigQuery

One of the core tenets of Course Builder is that quality online learning requires a feedback loop between instructor and student, which is why we’ve always had a focus on providing rich analytical information about a course. But no matter how complete, sometimes the built-in reports just aren’t enough. So Course Builder now includes a pipeline to Google BigQuery, allowing course owners to issue super-fast queries in a SQL-like syntax using the processing power of Google’s infrastructure. This allows you to slice and dice the data in an infinite number of ways, giving you just the information you need to help your students and optimize your course. Watch these videos on configuring and sending data.

To get started with your own course, follow these simple instructions. Please let us know how you use these new features and what you’d like to see in Course Builder next. Need some inspiration? Check out our list of courses (and tell us when you launch yours).

Keep on learning!

Read | No Comments | Tags: Google Research

Google Computer Science Capacity Awards

Mar 16, 2015

By Maggie Johnson, Director of Education and University Relations and Chris Busselle, Google.org

One of Google’s goals is to surface successful strategies that support the expansion of high-quality Computer Science (CS) programs at the undergraduate level. Innovations in teaching and technologies, while additionally ensuring better engagement of women and underrepresented minority students, is necessary in creating inclusive, sustainable, and scalable educational programs.

To address issues arising from the dramatic increase in undergraduate CS enrollments, we recently launched the Computer Science Capacity Awards program. For this three-year program, select educational institutions were invited to contribute proposals for innovative, inclusive, and sustainable approaches to address current scaling issues in university CS educational programs.

Today, after an extensive proposal review process, we are pleased to announce the recipients of the Capacity Awards program:

Carnegie Mellon University – Professor Jacobo Carrasquel
Alternate Instructional Model for Introductory Computer Science Classes
CMU will develop a new instructional model consisting of two optional mini lectures per week given by the instructor, and problem-solving sessions with flexible group meetings that are coordinated by undergraduate and graduate teaching assistants.

Duke University – Professor Jeffrey Forbes
North Carolina State University – Professor Kristy Boyer
University of North Carolina – Professor Ketan Mayer-Patel
RESEARCH TRIANGLE PEER TEACHING FELLOWS: Scalable Evidence-Based Peer Teaching for Improving CS Capacity and Diversity
The project hopes to increase CS retention and diversity by developing a highly scalable, effective, evidence-based peer training program across three universities in the North Carolina Research Triangle.

Mount Holyoke College – Professor Heather Pon-Barry
MaGE (Megas and Gigas Educate): Growing Computer Science Capacity at Mount Holyoke College
Mount Holyoke’s MaGE program includes a plan to grow enrollment in introductory CS courses, particularly for women and other underrepresented groups. The program also includes a plan of action for CS students to educate, mentor, and support others in inclusive ways.

George Mason University – Professor Jeff Offutt
SPARC: Self-PAced Learning increases Retention and Capacity
George Mason University wants to replace the traditional course model for CS-1 and CS-2 with an innovative teaching model of self- paced introductory programming courses. Students will periodically demonstrate competency with practical skills demonstrations similar to those used in martial arts.

Rutgers University – Professor Andrew Tjang
Increasing the Scalability and Diversity in the Face of Large Growth in Computer Science Enrollment
Rutger’s program addresses scalability issues with technology tools, as well as collaborative spaces. It also emphasizes outreach to Rutgers’ women’s college and includes original research on success in CS programs to create new courses that cater to the changing environment.

University of California, Berkeley – Professor John DeNero
Scaling Computer Science through Targeted Engagement
Berkeley’s program plans to increase Software Engineering and UI Design enrollment by 500 total students/year, as well as increase the number of women and underrepresented minority CS majors by a factor of three.

Each of the selected schools brings a unique and innovative approach to addressing current scaling issues, and we are excited to collaborate in developing concrete strategies to develop sustainable and inclusive educational programs. Stay tuned over the coming year, where we will report on program recipients’ progress and share results with the broader CS education community.

Read | No Comments | Tags: Google Research

Announcing the Google MOOC Focused Research Awards

Mar 9, 2015

Posted by Maggie Johnson, Director of Education and University Relations, and Aimin Zhu, University Relations Manager, APAC

Last year, Google and Tsinghua University hosted the 2014 APAC MOOC Focused Faculty Workshop, an event designed to share, brainstorm and generate ideas aimed at fostering MOOC innovation. As a result of the ideas generated at the workshop, we solicited proposals from the attendees for research collaborations that would advance important topics in MOOC development.

After expert reviews and committee discussions, we are pleased to announce the following recipients of the MOOC Focused Research Awards. These awards cover research exploring new interactions to enhance learning experience, personalized learning, online community building, interoperability of online learning platforms and education accessibility:

“MOOC Visual Analytics” – Michael Ginda, Indiana University, United States
“Improvement of students’ interaction in MOOCs using participative networks” – Pedro A. Pernías Peco, Universidad de Alicante, Spain
“Automated Analysis of MOOC Discussion Content to Support Personalised Learning” – Katrina Falkner, The University of Adelaide, Australia
“Extending the Offline Capability of Spoken Tutorial Methodology” – Kannan Moudgalya, Indian Institute of Technology Bombay, India
“Launching the Pan Pacific ISTP (Information Science and Technology Program) through MOOCs” – Yasushi Kodama, Hosei University, Japan
“Fostering Engagement and Social Learning with Incentive Schemes and Gamification Elements in MOOCs” – Thomas Schildhauer, Alexander von Humboldt Institute for Internet and Society, Germany
“Reusability Measurement and Social Community Analysis from MOOC Content Users” – Timothy K. Shih, National Central University, Taiwan

In order to further support these projects and foster collaboration, we have begun pairing the award recipients with Googlers pursuing online education research as well as product development teams.

Google is committed to supporting innovation in online learning at scale, and we congratulate the recipients of the MOOC Focused Research Awards. It is our belief that these collaborations will further develop the potential of online education, and we are very pleased to work with these researchers to jointly push the frontier of MOOCs.

Read | No Comments | Tags: Google Research

A step closer to quantum computation with Quantum Error Correction

Mar 4, 2015

Posted by Julian Kelly, Rami Barends, and Austin Fowler, Quantum Electronics Engineers

Computer scientists have dreamt of large-scale quantum computation since at least 1994 — the hope is that quantum computers will be able to process certain calculations much more quickly than any classical computer, helping to solve problems ranging from complicated physics or chemistry simulations to solving optimization problems to accelerating machine learning tasks.

One of the primary challenges is that quantum memory elements (“qubits”) have always been too prone to errors. They’re fragile and easily disturbed — any fluctuation or noise from their environment can introduce memory errors, rendering the computations useless. As it turns out, getting even just a small number of qubits together to repeatedly perform the required quantum logic operations and still be nearly error-free is just plain hard. But our team has been developing the quantum logic operations and qubit architectures to do just that.

In our paper “State preservation by repetitive error detection in a superconducting quantum circuit”, published in the journal Nature, we describe a superconducting quantum circuit with nine qubits where, for the first time, the qubits are able to detect and effectively protect each other from bit errors. This quantum error correction (QEC) can overcome memory errors by applying a carefully choreographed series of logic operations on the qubits to detect where errors have occurred.

Photograph of the device containing nine quantum bits (qubits). Each qubit interacts with its neighbors to protect them from error.

So how does QEC work? In a classical computer, we can monitor bits directly to detect errors. However, qubits are much more fickle — measuring a qubit directly will collapse entanglement and superposition states, removing the quantum elements that make it useful for computation.

To get around this, we introduce additional ‘measurement’ qubits, and perform a series of quantum logic operations that look at the ‘measurement’ and ‘data’ qubits in combination. By looking at the state of these pairwise combinations (using quantum XOR gates), and performing some careful cross-checking, we can pull out just enough information to detect errors without altering the information in any individual qubit.

The basics of error correction. ‘Measurement’ qubits can detect errors on ‘data’ qubits through the use of quantum XOR gates.

We’ve also shown that storing information in five qubits works better than just storing it in one, and that with nine qubits the error correction works even better. That’s a key result — it shows that the quantum logic operations are trustworthy enough that by adding more qubits, we can detect more complex errors that otherwise may cause algorithmic failure.

While the basic physical processes behind quantum error correction are feasible, many challenges remain, such as improving the logic operations behind error correction and testing protection from phase-flip errors. We’re excited to tackle these challenges on the way towards making real computations possible.

Read | No Comments | Tags: Google Research

Large-Scale Machine Learning for Drug Discovery

Mar 2, 2015

Posted by Patrick Riley and Dale Webster, Google Research and Bharath Ramsundar, Google Research Intern and Stanford Ph.D. candidate

Discovering new treatments for human diseases is an immensely complicated challenge; Even after extensive research to develop a biological understanding of a disease, an effective therapeutic that can improve the quality of life must still be found. This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates. These high-throughput screens are often automated in sophisticated labs and are expensive to perform.

Recently, deep learning with neural networks has been applied in virtual drug screening^1,2,3, which attempts to replace or augment the high-throughput screening process with the use of computational methods in order to improve its speed and success rate.⁴ Traditionally, virtual drug screening has used only the experimental data from the particular disease being studied. However, as the volume of experimental drug screening data across many diseases continues to grow, several research groups have demonstrated that data from multiple diseases can be leveraged with multitask neural networks to improve the virtual screening effectiveness.

In collaboration with the Pande Lab at Stanford University, we’ve released a paper titled “Massively Multitask Networks for Drug Discovery“, investigating how data from a variety of sources can be used to improve the accuracy of determining which chemical compounds would be effective drug treatments for a variety of diseases. In particular, we carefully quantified how the amount and diversity of screening data from a variety of diseases with very different biological processes can be used to improve the virtual drug screening predictions.

Using our large-scale neural network training system, we trained at a scale 18x larger than previous work with a total of 37.8M data points across more than 200 distinct biological processes. Because of our large scale, we were able to carefully probe the sensitivity of these models to a variety of changes in model structure and input data. In the paper, we examine not just the performance of the model but why it performs well and what we can expect for similar models in the future. The data in the paper represents more than 50M total CPU hours.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

One encouraging conclusion from this work is that our models are able to utilize data from many different experiments to increase prediction accuracy across many diseases. To our knowledge, this is the first time the effect of adding additional data has been quantified in this domain, and our results suggest that even more data could improve performance even further.

Machine learning at scale has significant potential to accelerate drug discovery and improve human health. We look forward to continued improvement in virtual drug screening and its increasing impact in the discovery process for future drugs.

Thank you to our other collaborators David Konerding (Google), Steven Kearnes (Stanford), and Vijay Pande (Stanford).

References:

1. Thomas Unterthiner, Andreas Mayr, Günter Klambauer, Marvin Steijaert, Jörg Kurt Wegner, Hugo Ceulemans, Sepp Hochreiter. Deep Learning as an Opportunity in Virtual Screening. Deep Learning and Representation Learning Workshop: NIPS 2014

2. Dahl, George E, Jaitly, Navdeep, and Salakhutdinov, Ruslan. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231, 2014.

3. Ma, Junshui, Sheridan, Robert P, Liaw, Andy, Dahl, George, and Svetnik, Vladimir. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 2015.

4. Peter Ripphausen, Britta Nisius, Lisa Peltason, and Jürgen Bajorath. Quo Vadis, Virtual Screening? A Comprehensive Survey of Prospective Applications. Journal of Medicinal Chemistry 2010 53 (24), 8461-8467

Read | No Comments | Tags: Google Research

From Pixels to Actions: Human-level control through Deep Reinforcement Learning

Feb 25, 2015

Posted by Dharshan Kumaran and Demis Hassabis, Google DeepMind, London

Remember the classic videogame Breakout on the Atari 2600? When you first sat down to try it, you probably learned to play well pretty quickly, because you already knew how to bounce a ball off a wall in real life. You may have even worked up a strategy to maximise your overall score at the expense of more immediate rewards. But what if you didn’t possess that real-world knowledge — and only had the pixels on the screen, the control paddle in your hand, and the score to go on? How would you, or equally any intelligent agent faced with this situation, learn this task totally from scratch?

This is exactly the question that we set out to answer in our paper “Human-level control through deep reinforcement learning”, published in Nature this week. We demonstrate that a novel algorithm called a deep Q-network (DQN) is up to this challenge, excelling not only at Breakout but also a wide variety of classic videogames: everything from side-scrolling shooters (River Raid) to boxing (Boxing) and 3D car racing (Enduro). Strikingly, DQN was able to work straight “out of the box” across all these games – using the same network architecture and tuning parameters throughout and provided only with the raw screen pixels, set of available actions and game score as input.

The results: DQN outperformed previous machine learning methods in 43 of the 49 games. In fact, in more than half the games, it performed at more than 75% of the level of a professional human player. In certain games, DQN even came up with surprisingly far-sighted strategies that allowed it to achieve the maximum attainable score—for example, in Breakout, it learned to first dig a tunnel at one end of the brick wall so the ball could bounce around the back and knock out bricks from behind.

Video courtesy of Atari Inc. and Mnih et al. “Human-level control through deep reinforcement learning”, Nature 26 Feb. 2015.

So how does it work? DQN incorporated several key features that for the first time enabled the power of Deep Neural Networks (DNN) to be combined in a scalable fashion with Reinforcement Learning (RL)—a machine learning framework that prescribes how agents should act in an environment in order to maximize future cumulative reward (e.g., a game score). Foremost among these was a neurobiologically inspired mechanism, termed “experience replay,” whereby during the learning phase DQN was trained on samples drawn from a pool of stored episodes—a process physically realized in a brain structure called the hippocampus through the ultra-fast reactivation of recent experiences during rest periods (e.g., sleep). Indeed, the incorporation of experience replay was critical to the success of DQN: disabling this function caused a severe deterioration in performance.

Comparison of the DQN agent with the best reinforcement learning methods in the literature. The performance of DQN is normalized with respect to a professional human games tester (100% level) and random play (0% level). Note that the normalized performance of DQN, expressed as a percentage, is calculated as: 100 X (DQN score – random play score)/(human score – random play score). Error bars indicate s.d. across the 30 evaluation episodes, starting with different initial conditions. Figure courtesy of Mnih et al. “Human-level control through deep reinforcement learning”, Nature 26 Feb. 2015.

This work offers the first demonstration of a general purpose learning agent that can be trained end-to-end to handle a wide variety of challenging tasks, taking in only raw pixels as inputs and transforming these into actions that can be executed in real-time. This kind of technology should help us build more useful products—imagine if you could ask the Google app to complete any kind of complex task (“Okay Google, plan me a great backpacking trip through Europe!”).

We also hope this kind of domain general learning algorithm will give researchers new ways to make sense of complex large-scale data creating the potential for exciting discoveries in fields such as climate science, physics, medicine and genomics. And it may even help scientists better understand the process by which humans learn. After all, as the great physicist Richard Feynman famously said: “What I cannot create, I do not understand.”

Read | No Comments | Tags: Google Research

previous page · next page

Google Data

About: Research Blog

Profile:

Website

Contact:

Posts by Research Blog

New ways to add Reminders in Inbox by Gmail

Google Computer Vision research at CVPR 2015

Announcing the 2015 Google European Doctoral Fellows

A Multilingual Corpus of Automatically Extracted Relations from Wikipedia

Sergey and Larry awarded the Seoul Test-of-Time Award from WWW 2015

Tone: An experimental Chrome extension for instant sharing over audio

Paper to Digital in 200+ languages

Google Handwriting Input in 82 languages on your Android mobile device

Beyond Short Snippets: Deep Networks for Video Classification

Skill maps, analytics and more with Google’s Course Builder 1.8

Google Computer Science Capacity Awards

Announcing the Google MOOC Focused Research Awards

A step closer to quantum computation with Quantum Error Correction

Large-Scale Machine Learning for Drug Discovery

From Pixels to Actions: Human-level control through Deep Reinforcement Learning

Categories

Tags