<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Google Data &#187; Peter Norvig</title>
	<atom:link href="/author/peter-norvig/feed/" rel="self" type="application/rss+xml" />
	<link>https://googledata.org</link>
	<description>Everything Google: News, Products, Services, Content, Culture</description>
	<lastBuildDate>Wed, 18 Mar 2015 21:09:38 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.7.5</generator>
	<item>
		<title>Hear, here.   A Sample of Audio Processing at Google.</title>
		<link>https://googledata.org/google-research/hear-here-a-sample-of-audio-processing-at-google/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=hear-here-a-sample-of-audio-processing-at-google</link>
		<comments>https://googledata.org/google-research/hear-here-a-sample-of-audio-processing-at-google/#comments</comments>
		<pubDate>Wed, 14 Feb 2007 17:54:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Shumeet Baluja, Michele Covell, Pedro Moreno &#038; Eugene WeinsteinText isn't the only source of information on the web!  We've been working on a variety of projects related to audio and visual recognition.  One of the fundamental constraints tha...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Shumeet Baluja, Michele Covell, Pedro Moreno & Eugene Weinstein</span><br /><br />Text isn't the only source of information on the web!  We've been working on a variety of projects related to audio and visual recognition.  One of the fundamental constraints that we have in designing systems at Google is the huge amounts of data that we need to process rapdily.  A few of the research papers that have come out of this work are shown here.  <br /><br />In the first pair of papers, to be presented at the 2007 International Conference on Acoustics, Speech and Signal Processing (<a href="http://www.esprockets.com/papers/Baluja_Covell_ICASSP2007_1888.pdf">Waveprint Overview</a>, <a href="http://www.esprockets.com/papers/detection_CB.pdf">Waveprint-for-Known-Audio</a>), we show how computer vision processing techniques, combined with large-scale data stream processing, can create an efficient system for recognizing audio that has been degraded by various means such as cell phone playback, lossy compression, echoes, time-dilation (as found on the radio), competing noise, etc. <br /><br />It is also fun and surprising to see how often in research the same problem can be approached from a completely different perspective.  In the third paper to be presented at ICASSP-2007 (<a href="http://pmoreno.googlepages.com/music.pdf">Music Identification with WFST</a>) we explore how acoustic modeling techniques commonly used in speech recognition, and finite state transducers used to represent and search large graphs, can be used in the problem of music identification.  Our approach learns a common alphabet of music sounds (which we call music-phones) and represents large song collections as a big graph where efficient search is possible.<br /><br />Perhaps one of the most interesting aspects of audio recognition goes beyond the matching of degraded signals, and instead attempts to capture meaningful notions of similarity.  In our paper presented at the International Conference on Artificial Intelligence (<a href="http://www.esprockets.com/papers/IJCAI07_BalujaCovell.pdf">Music Similarity</a>), we describe a system that learns relevant similarities in music signals, while maintaining efficiency by using these learned models to create customized hashing functions.  <br /><br />We're extending these pieces of work in a variety of ways, not only in the learning algorithms used, but also the application areas.  If you're interested in joining google research and working on these projects, be sure to drop us a line.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-117147989383764198?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/hear-here-a-sample-of-audio-processing-at-google/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Research Picks for Videos of the Year</title>
		<link>https://googledata.org/google-research/google-research-picks-for-videos-of-the-year/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=google-research-picks-for-videos-of-the-year</link>
		<comments>https://googledata.org/google-research/google-research-picks-for-videos-of-the-year/#comments</comments>
		<pubDate>Mon, 11 Dec 2006 22:58:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Peter NorvigEveryone else is giving you year-end top ten lists of their favorite movies, so we thought we'd give you ours, but we're skipping Cars and The Da Vinci Code and giving you autonomous cars and open source code.  Our top twenty (we ...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Peter Norvig</span><br /><br />Everyone else is giving you year-end top ten lists of their favorite movies, so we thought we'd give you <a href="http://video.google.com/videosearch?q=Google+engedu"><b>ours</b></a>, but we're skipping <i>Cars</i> and <i>The Da Vinci Code</i> and giving you autonomous cars and open source code.  Our top twenty (we couldn't stop at ten):<br /><ol><br /><li><a href="http://video.google.com/videoplay?docid=8594517128412883394&q=engedu">Winning the DARPA Grand Challenge</a>: Sebastian Thrun stars in the heartwarming drama of a little car that could.<br /></li><li><a href="http://video.google.com/videoplay?docid=-7497796873809571567&amp;q=engedu">The Graphing Calculator Story</a>: A thriller starring Ron Avitzur as the engineer who snuck into the Apple campus to write code.<br /></li><li><a href="http://video.google.com/videoplay?docid=1996321846673788606&q=engedu">Should Google Go Nuclear?</a>: Robert Bussard (former Asst. Director of the AEC) talks about inertial electrostatic fusion.<br /></li><li><a href="http://video.google.com/videoplay?docid=-6972678839686672840&amp;q=engedu">A New Way to Look at Networking</a>: Van Jacobson as the old pro discovering that the old problems have not gone away.<br /></li><li><a href="http://video.google.com/videoplay?docid=-6459339159268485356&q=engedu">Python 3000</a>: Guido van Rossum always looks on the bright side of life in this epic look at the future of Python.<br /></li><li><a href="http://video.google.com/videoplay?docid=7951038502689013454">How to Survive a Robot Uprising</a>: Daniel Wilson stars in this sci-fi horror story.<br /></li><li><a href="http://video.google.com/videoplay?docid=-6363477088541627991&amp;q=engedu">The New "Bill of Rights of Information Society"</a>: Raj Reddy talks about how to get the right information to the right people at the right time.<br /></li><li><a href="http://video.google.com/videoplay?docid=448441135356213813&q=engedu">Practical Common Lisp</a>: In this foreign film, Peter Seibel introduces the audience to a new language. Subtitles in parentheses.<br /></li><li><a href="http://video.google.com/videoplay?docid=3897010229726822034&amp;q=engedu">Debugging Backwards in Time</a>: Starring Bil Lewis in this sequel to <i>Back to the Future</i>.<br /></li><li><a href="http://video.google.com/videoplay?docid=-5699448884004201579&q=engedu">Building Large Systems at Google</a>: Narayanan Shivakumar takes us behind the scenes to see how Google builds large distributed systems.  Like <i>Charlie and the Chocolate Factory</i> but without the Oompa-Loompas.<br /></li><li><a href="http://video.google.com/videoplay?docid=-6459171443654125383&amp;q=Google+engedu">The Science and Art of User Experience at Google</a>: Jen Fitzpatrick continues the behind-the-scenes look.<br /></li><li><a href="http://video.google.com/videoplay?docid=7423822399040727959&q=Google+engedu">Universally Accessible Demands Accessibility for All of Humanity</a>: McArthur "Genius Award" Fellow Jim Fruchterman talks about accessibility for the blind and others.<br /></li><li><a href="http://video.google.com/videoplay?docid=-8220394453782681101&amp;q=watson+google">DNA and the Brain</a>: Nobel Laureate James Watson explains how the key to understanding the brain is in our genes.<br /></li><li><a href="http://video.google.com/videoplay?docid=383231378223541436&q=authors+google">Steve Wozniak</a>: This one-man show is playing to boffo reviews.<br /></li><li><a href="http://video.google.com/videoplay?docid=3763187418980465301">Jane Goodall</a>: The celebrated primatologist discusses her mission to empower individuals to improve the environment.<br /></li><li><a href="http://video.google.com/videoplay?docid=-7704388615049492068&amp;q=engedu">Computers Versus Common Sense</a>: Doug Lenat reprises his role as the teacher trying to get computers to understand.<br /></li><li><a href="http://video.google.com/videoplay?docid=-5108715140883188059">The Google Story</a>: David Vise talks about his book on Google.<br /></li><li><a href="http://video.google.com/videoplay?docid=-4381488634998231167">The Search</a>: John Battelle talks about his book on Google.<br /></li><li><a href="http://video.google.com/videoplay?docid=8211813884612792878&q=engedu">The Archimedes Palimpsest</a>: Like <i>Da Vinci Code</i>, only true.<br /></li><li><a href="http://video.google.com/videoplay?docid=6127548813950043200&amp;q=engedu">The Paradox of Choice - Why More is Less</a>: With Barry Schwartz.  Hmm, maybe I should have made this a top <i>three</i> list?<br /></li></ol><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-116587801847653170?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/google-research-picks-for-videos-of-the-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>And the Awards Go To &#8230;</title>
		<link>https://googledata.org/google-research/and-the-awards-go-to/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=and-the-awards-go-to</link>
		<comments>https://googledata.org/google-research/and-the-awards-go-to/#comments</comments>
		<pubDate>Fri, 22 Sep 2006 16:25:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Proud GooglersWe're usually a modest bunch, but we we couldn't help but let you know about some honors and awards bestowed on Googlers recently: Ramakrishnan Srikant is the winner of the 2006 ACM SIGKDD Innovation Award for his work on prunin...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Proud Googlers</span><br /><br />We're usually a modest bunch, but we we couldn't help but let you know about some honors and awards bestowed on Googlers recently:<br /><ul><br /><li> Ramakrishnan Srikant is the winner of the 2006 ACM <a href="http://www.acm.org/sigs/sigkdd/awards.php#2006i">SIGKDD Innovation Award</a> for his work on pruning techniques for the discovery of association rules, and for developing new data mining approaches that respect the privacy of people in the data base.<br /><br /><li> Henry Rowley and Shumeet Baluja, along with CMU professor Takeo Kanade, received the Longuet-Higgins prize for "a contribution which has stood the test of time," namely their 1996 paper <i><a href="http://www.ri.cmu.edu/pubs/pub_2678.html">Neural Network based face detection</a></i>.  The award was given at the 2006 Computer Vision and Pattern Recognition (<a href="http://www.cvpr.org/2006/">CVPR</a>) Conference.<br /><br /><li> <i>Team Smartass</i>, consisting of Christopher Hendrie, Derek Kisman, Ambrose Feinstein and Daniel Wright won first place in the ICFP (International Conference on Functional Programming) <a href="http://icfpcontest.org/scoreboard.shtml">programming contest</a>, using a combination of C++, Haskell and 2D. Third place went to <i>Can't Spell Awesome without ASM</i>, a team consisting of Google engineer Jon Dethridge, former Google interns Ralph Furmaniak and Tomasz Czajka, and Reid Barton of Harvard. They got the judges at the functional programming conference to admit "Assembler is not too shabby."<br /><br /><li> Peter Norvig was named a Berkeley <a href="http://www.eecs.berkeley.edu/alumni/distinguished.cs.shtml#outstanding">Distinguished Alumni</a> in Computer Science, and gave the keynote <a href="http://norvig.com/speech.html">commencement address</a>.  We'd also like to congratulate  Prabhakar Raghavan, Head of Yahoo Research, who was a co-recipient of this award.<br /><br /><li> Simon Quellen Field's book <i><a href="http://www.amazon.com/gp/product/1556526105">Return of Gonzo Gizmos</a></i> was a selection of the <a href="http://www.sciambookclub.com">Scientific American Book Club</a>.<br /><br /><li> Google summer intern Rion Snow (along with Stanford professors Dan Jurafsky and Andrew Ng) got the best paper award at the 2006 <a href="http://www.acl2006.mq.edu.au/">ACL/COLING</a> (computational linguistics) conference for his paper titled <i><a href="http://ai.stanford.edu/~rion/papers/semtax_acl06.pdf">Semantic taxonomy induction from heterogenous evidence</a></i>.<br /><br /><li> Google summer intern Lev Reyzin won the outstanding student paper award at <a href="http://www.icml2006.org/icml2006/16648.html">ICML</a> (International Conference on Machine Learning) for work with Rob Schapire of Princeton on <i><a href="http://www.icml2006.org/icml_documents/camera-ready/095_How_Boosting_the_Mar.pdf">How Boosting the Margin Can Also Boost Classifier Complexity</a></i>.<br /><br /><li> As we <a href="http://googleresearch.blogspot.com/2006/06/interactive-tv-conference-and-best.html">mentioned earlier</a>, Michael Fink, Michele Covell and Shumeet Baluja won a best paper award for <i>Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification</i>.<br /><br /><li> <b>Update 13 Oct 2006:</b> <a href="http://www.technologyreview.com/TR35/Profile.aspx?Cand=T&TRID=437">Paul Rademacher</a> has been named one of the top innovators under 35 by MIT's <a href="http://www.technologyreview.com/">Technology Review</a>. He was cited<br />for his mashup of Google Maps and Craig's List housing data at <a href="http://housingmaps.com">housingmaps.com</a>.<br /><br /><li> <b>Update 31 Oct 2006:</b> We forgot Alon Halevy, who won the <a href="http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb2006.html">VLDB 10 Year Best Paper Award</a> for <i>Querying Heterogeneous Information Sources Using Source Descriptions</i> with Anand Rajaraman and Joann J. Ordille.<br /><br /></ul><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-115894742981077088?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/and-the-awards-go-to/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>All Our N-gram are Belong to You</title>
		<link>https://googledata.org/google-research/all-our-n-gram-are-belong-to-you/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=all-our-n-gram-are-belong-to-you</link>
		<comments>https://googledata.org/google-research/all-our-n-gram-are-belong-to-you/#comments</comments>
		<pubDate>Thu, 03 Aug 2006 18:26:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Alex Franz and Thorsten Brants, Google Machine Translation TeamHere at Google Research we have been using word n-gram models for a variety of R&#38;D projects, such as statistical machine translation, speech recognition, spelling correction, ...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Alex Franz and Thorsten Brants, Google Machine Translation Team</span><br /><br />Here at Google Research we have been using word <a href="http://en.wikipedia.org/wiki/N-gram">n-gram models</a> for a variety of R&amp;D projects, such as <a href="http://googleresearch.blogspot.com/2006/04/statistical-machine-translation-live.html">statistical machine translation</a>, speech recognition, <a href="http://www.google.com/jobs/britney.html">spelling correction</a>, entity detection, information extraction, and others.  While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing <a href="http://labs.google.com/papers/mapreduce.html">infrastructure</a> to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of <i>one trillion words</i> from public Web pages.<br /><br />We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.<br /><br />Watch for an announcement at the Linguistics Data Consortium (<a href="http://www.ldc.upenn.edu">LDC</a>), who will be distributing it soon, and then order your set of 6 DVDs. And <a href="mailto:ngrams@google.com">let us hear from you</a> - we're excited to hear what you will do with the data, and we're always interested in feedback about this dataset, or other potential datasets that might be useful for the research community.<br /><br /><b>Update (22 Sept. 2006):</b> The LDC now has the <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13">data available</a> in their catalog. The counts are as follows:<br /><pre><br />File sizes: approx. 24 GB compressed (gzip'ed) text files<br /><br />Number of tokens:    1,024,908,267,229<br />Number of sentences:    95,119,665,584<br />Number of unigrams:         13,588,391<br />Number of bigrams:         314,843,401<br />Number of trigrams:        977,069,902<br />Number of fourgrams:     1,313,818,354<br />Number of fivegrams:     1,176,470,663<br /></pre><br />The following is an example of the 3-gram data contained this corpus:<br /><pre><br />ceramics collectables collectibles 55<br />ceramics collectables fine 130<br />ceramics collected by 52<br />ceramics collectible pottery 50<br />ceramics collectibles cooking 45<br />ceramics collection , 144<br />ceramics collection . 247<br />ceramics collection &lt;/S&gt; 120<br />ceramics collection and 43<br />ceramics collection at 52<br />ceramics collection is 68<br />ceramics collection of 76<br />ceramics collection | 59<br />ceramics collections , 66<br />ceramics collections . 60<br />ceramics combined with 46<br />ceramics come from 69<br />ceramics comes from 660<br />ceramics community , 109<br />ceramics community . 212<br />ceramics community for 61<br />ceramics companies . 53<br />ceramics companies consultants 173<br />ceramics company ! 4432<br />ceramics company , 133<br />ceramics company . 92<br />ceramics company &lt;/S&gt; 41<br />ceramics company facing 145<br />ceramics company in 181<br />ceramics company started 137<br />ceramics company that 87<br />ceramics component ( 76<br />ceramics composed of 85<br />ceramics composites ferrites 56<br />ceramics composition as 41<br />ceramics computer graphics 51<br />ceramics computer imaging 52<br />ceramics consist of 92<br /></pre><br />The following is an example of the 4-gram data in this corpus:<br /><pre><br />serve as the incoming 92<br />serve as the incubator 99<br />serve as the independent 794<br />serve as the index 223<br />serve as the indication 72<br />serve as the indicator 120<br />serve as the indicators 45<br />serve as the indispensable 111<br />serve as the indispensible 40<br />serve as the individual 234<br />serve as the industrial 52<br />serve as the industry 607<br />serve as the info 42<br />serve as the informal 102<br />serve as the information 838<br />serve as the informational 41<br />serve as the infrastructure 500<br />serve as the initial 5331<br />serve as the initiating 125<br />serve as the initiation 63<br />serve as the initiator 81<br />serve as the injector 56<br />serve as the inlet 41<br />serve as the inner 87<br />serve as the input 1323<br />serve as the inputs 189<br />serve as the insertion 49<br />serve as the insourced 67<br />serve as the inspection 43<br />serve as the inspector 66<br />serve as the inspiration 1390<br />serve as the installation 136<br />serve as the institute 187<br />serve as the institution 279<br />serve as the institutional 461<br />serve as the instructional 173<br />serve as the instructor 286<br />serve as the instructors 161<br />serve as the instrument 614<br />serve as the instruments 193<br />serve as the insurance 52<br />serve as the insurer 82<br />serve as the intake 70<br />serve as the integral 68<br /></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-115463142334550661?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/all-our-n-gram-are-belong-to-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interactive TV: Conference and Best Paper</title>
		<link>https://googledata.org/google-research/interactive-tv-conference-and-best-paper/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=interactive-tv-conference-and-best-paper</link>
		<comments>https://googledata.org/google-research/interactive-tv-conference-and-best-paper/#comments</comments>
		<pubDate>Tue, 06 Jun 2006 17:13:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Michele Covell &#38; Shumeet Baluja, Research ScientistsEuro ITV (the interactive television conference) took place in Athens last week.  The presentations included a diverse collection of user studies, new application areas, and exploratory ...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Michele Covell &amp; Shumeet Baluja, Research Scientists</span><br /><br />Euro ITV (the interactive television conference) took place in Athens last week.  The presentations included a diverse collection of user studies, new application areas, and exploratory business models. One of the main themes was the integration of multiple information sources.  For example, during a time-out in a live sporting event, some viewers may enjoy reviewing highlight footage, while others may prefer to view a parallel program to view player profiles and statistics before being automatically returned to the soccer match once the event was back underway.<br /><br />Other papers explored the idea of selecting and recommending videos.  When many videos are available, such as through IPTV or digital cable, we see a <i>heavy-tailed distribution</i> of content accesses (much like that on the internet).  There are a small number of popular channels but the combined viewings from thousands of "niche" channels outweigh the popular channels.  As on the web, the problem that arises from this situation is one of discovery. A <i>TV guide</i> type resource is not practical; methods like collaborative-filtering can help.  Nonetheless, new ideas and interfaces are needed.  <br /><br />We also presented our work at the conference.  <a href="http://www.mangolassi.org/covell/pubs/euroITV-2006.pdf">Our paper [pdf]</a> (which received the <b>best paper</b> award :) focused on using broadcast viewing to automatically present relevant information on a web browser.  We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort.  The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed.  Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-114962138526611754?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/interactive-tv-conference-and-best-paper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extra, Extra &#8211; Read All About It: Nearly All Binary Searches and Mergesorts are Broken</title>
		<link>https://googledata.org/google-research/extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken</link>
		<comments>https://googledata.org/google-research/extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken/#comments</comments>
		<pubDate>Fri, 02 Jun 2006 15:34:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Joshua Bloch, Software EngineerI remember vividly Jon Bentley's first Algorithms lecture at CMU, where he asked all of us incoming Ph.D. students to write a binary search, and then dissected one of our implementations in front of the class.  ...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Joshua Bloch, Software Engineer</span><br /><br />I remember vividly Jon Bentley's first Algorithms lecture at CMU, where he asked all of us incoming Ph.D. students to write a binary search, and then dissected one of our implementations in front of the class.  Of course it was broken, as were most of our implementations.  This made a real impression on me, as did the treatment of this material in his wonderful <i>Programming Pearls</i> (Addison-Wesley, 1986; Second Edition, 2000).  The key lesson was to carefully consider the invariants in your programs.<br /><br />Fast forward to 2006.  I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of <i>Programming Pearls</i> contains a bug.  Once I tell you what it is, you will understand why it escaped detection for two decades.  Lest you think I'm picking on Bentley, let me tell you how I discovered the bug: The version of binary search that I wrote for the JDK contained the same bug. It was reported to Sun recently when it broke someone's program, after lying in wait for nine years or so.<br /><br />So what's the bug?  Here's a standard binary search, in Java.  (It's one that I wrote for the <code>java.util.Arrays</code>):<br /><br /><pre>1:     public static int binarySearch(int[] a, int key) {<br />2:         int low = 0;<br />3:         int high = a.length - 1;<br />4:<br />5:         while (low <= high) {<br />6:             int mid = (low + high) / 2;<br />7:             int midVal = a[mid];<br />8:<br />9:             if (midVal < key)<br />10:                 low = mid + 1<br />11:             else if (midVal > key)<br />12:                 high = mid - 1;<br />13:             else<br />14:                 return mid; // key found<br />15:         }<br />16:         return -(low + 1);  // key not found.<br />17:     }<br /></pre><br />The bug is in this line:<br /><pre> 6:             int mid =(low + high) / 2;</pre><br />In <i>Programming Pearls</i> Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer."  On the face of it, this assertion might appear correct, but it fails for large values of the <code>int</code> variables <code>low</code> and <code>high</code>. Specifically, it fails if the sum of <code>low</code> and <code>high</code> is greater than the maximum positive <code>int</code> value (2<sup>31</sup> - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results.  In Java, it throws <code>ArrayIndexOutOfBoundsException</code>.<br /><br />This bug can manifest itself for arrays whose length (in elements) is 2<sup>30</sup> or greater (roughly a billion elements).  This was inconceivable back in the '80s, when <i>Programming Pearls</i> was written, but it is common these days at Google and other places.  In <i>Programming Pearls</i>, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of <span style="font-style: italic;">n</span> did not appear until 1962."  The truth is, very few correct versions have ever been published, at least in mainstream programming languages.<br /><br />So what's the best way to fix the bug? Here's one way:<br /><pre> 6:             int mid = low + ((high - low) / 2);</pre><br />Probably faster, and arguably as clear is:<br /><pre> 6:             int mid = (low + high) >>> 1;</pre><br />In C and C++ (where you don't have the <code>>>></code> operator), you can do this:<br /><pre> 6:             mid = ((unsigned int)low + (unsigned int)high)) >> 1;</pre><br />And now we <i>know</i> the binary search is bug-free, right?  Well, we strongly suspect so, but we don't know. It is not sufficient merely to prove a program correct; you have to test it too.  Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.  With concurrent programs, it's even worse: You have to test for all internal states, which is, for all practical purposes, impossible.<br /><br />The binary-search bug applies equally to mergesort, and to other divide-and-conquer algorithms.  If you have any code that implements one of these algorithms, fix it now before it blows up.  The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.<br /><br />We programmers need all the help we can get, and we should never assume otherwise.  Careful design is great.  Testing is great.  Formal methods are great.  Code reviews are great. Static analysis is great.  But none of these things alone are sufficient to eliminate bugs: They will always be with us.  A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant.<br /><br /><span style="font-weight: bold;">Update 17 Feb 2008</span>: Thanks to Antoine Trux, Principal Member of Engineering Staff at Nokia Research Center Finland for pointing out that the original proposed fix for C and C++ (Line 6), was not guaranteed to work by the relevant C99 standard (<span style="font-style: italic;">INTERNATIONAL STANDARD - ISO/IEC - 9899 - Second edition - 1999-12-01</span>, <a href="http://3.4.3.3/" >3.4.3.3</a>), which says that if you add two signed quantities and get an overflow, the result is undefined. The older C Standard, C89/90, and the C++ Standard are both identical to C99 in this respect. Now that we've made this change, we <span style="font-style: italic;">know</span> that the program is correct;)<h3>Resources</h3><br /><ul> <li> <a href="http://www.google.com/url?sa=D&amp;q=http%3A%2F%2Fnetlib.bell-labs.com%2Fcm%2Fcs%2Fpearls%2F" ><i>Programming Pearls</i></a> - Highly recommended.  Get a copy today!  </li><li> <a href="http://www.google.com/url?sa=D&amp;q=http%3A%2F%2Fbugs.sun.com%2Fbugdatabase%2Fview_bug.do%3Fbug_id%3D5045582" > The Sun bug report describing this bug in the JDK</a>  </li><li> <a href="http://www.google.com/url?sa=D&amp;q=http%3A%2F%2Fwww.di.unipi.it%2F%7Eruggieri%2FPapers%2Fsemisum.pdf" > A 2003 paper by Salvatore Ruggieri</a> discussing a related problem -  The problem is a bit more general but perhaps less interesting: the average of two numbers of arbitrary sign.  The paper does not discuss performance, and its solution is not fast enough for use in the inner loop of a mergesort. </li> </ul><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-114926336657808713?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/extra-extra-read-all-about-it-nearly-all-binary-searches-and-mergesorts-are-broken/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Robots</title>
		<link>https://googledata.org/google-research/first-robots/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=first-robots</link>
		<comments>https://googledata.org/google-research/first-robots/#comments</comments>
		<pubDate>Thu, 23 Mar 2006 02:28:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by  Sumit Agarwal, Maryam Kamvar, &#038; Michael Stoppelman With 4 seconds left to go, the Team Cheesy Poofs robot shouldered its way onto the 3 foot platform, pivoted 90 degrees into scoring position, and rapid-fired 10 balls directly into the 3-poi...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by  Sumit Agarwal, Maryam Kamvar, & Michael Stoppelman </span><br /><br />With 4 seconds left to go, the Team <a href="http://team254.bcp.org/">Cheesy Poofs</a> robot shouldered its way onto the 3 foot platform, pivoted 90 degrees into scoring position, and rapid-fired 10 balls directly into the 3-point goal. They won the match, and the Google Silicon Valley Regional Championship for US FIRST, a non-profit "For the Inspiration and Recognition of Science and Technology" (<a href="http://www.usfirst.org/">FIRST</a>).<br /><br />Google jumped at the opportunity to sponsor this organization after Dean Kamen (inventor of the Segway and the first implantable dialysis pump) spoke to a packed Google audience about his lifelong crusade to improve education in the United States. Dean founded US FIRST over 15 years ago, and from humble beginnings in the Northeast, FIRST has now grown to involve over 60,000 high school students all over the United States and the world.<br /><br />FIRST was a natural partner for Google, given their focus on science and technology, their passion for changing the world for the better, and their single-minded focus on making education fun for students. When the final buzzer rang at the recent championship match the students jumped and hugged like they'd won the Superbowl. And in a way, they had. This event has all the excitement, tension, and drama of a major sporting event and then some.<br /><br />Beyond sponsoring the FIRST tournament, Google also funded half a dozen teams in the Bay Area, ranging from East Palo Alto High School to Notre Dame High School. Several dozen employees also served as team mentors, meeting the students once a week to help construct the competition robots over the frantic six-week design/build cycle. Others volunteered at the Regional event as judges, coordinators, and referees, and plenty of Googlers were on hand to spectate the exciting matches.<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://googleresearch.blogspot.com/uploaded_images/first06-778391.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://googleresearch.blogspot.com/uploaded_images/first06-777007.jpg" border="0" alt="" /></a><br /><br />We congratulate all the teams at the regional tournament for their hard work and innovation. We wish the six bay-area teams who qualified for the finals in Atlanta the best of luck . Bring home the gold!<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-114308126592079263?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/first-robots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hiring: The Lake Wobegon Strategy</title>
		<link>https://googledata.org/google-research/hiring-the-lake-wobegon-strategy/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=hiring-the-lake-wobegon-strategy</link>
		<comments>https://googledata.org/google-research/hiring-the-lake-wobegon-strategy/#comments</comments>
		<pubDate>Sat, 11 Mar 2006 21:28:00 +0000</pubDate>
		<dc:creator><![CDATA[Peter Norvig]]></dc:creator>
				<category><![CDATA[Google Research]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Posted by Peter Norvig, Director, Google ResearchYou know the Google story: small start-up of highly-skilled programmers in a garage grows into a large international company. But how do you maintain the skill level while roughly doubling in size each y...]]></description>
				<content:encoded><![CDATA[<span class="byline-author">Posted by Peter Norvig, Director, Google Research</span><br /><br />You know the Google story: small start-up of highly-skilled programmers in a garage grows into a large international company. But how do you maintain the skill level while roughly doubling in size each year? We rely on the Lake Wobegon Strategy, which says <i>only hire candidates who are above the mean of your current employees.</i> An alternative strategy (popular in the dot-com boom period) is to justify a hire by saying "this candidate is clearly better than at least one of our current employees."  The following graph compares the mean employee skill level of two strategies: hire-above-the-mean (or Lake Wobegon) in blue and hire-above-the-min in red.  I ran a simulation of 1000 candidates with skill level sampled uniformly from the 0 to 100th percentile (but evaluated by the interview process with noise of &plusmn;15%) starting from a core team of 10 employees with mean 75 and min 65.  You can see how hire-above-the-min leads to a precipitous drop in skill level; one we've been able to avoid.<br /><br /><a href="http://googleresearch.blogspot.com/uploaded_images/chart-728362.gif"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://googleresearch.blogspot.com/uploaded_images/chart-727755.gif" border="0" alt="" /></a><br /><br />Another hiring strategy we use is <i>no hiring manager</i>.  Whenever you give project managers responsibility for hiring for their own projects they'll take the best candidate in the pool, even if that candidate is sub-standard for the company, because every manager wants some help for their project rather than no help.  That's why we do all hiring at the company level, not the project level.  First we decide which candidates are above the hiring threshold, and then we decide what projects they can best contribute to. The orange line in the graph above is a simulation of the hiring-manager strategy, with the same candidates and the same number of hires as the no-hiring-manager strategy in blue. Employees are grouped into pools of random size from 2 to 14 and the hiring manager chooses the best one. We're pleased that these little simulations show our hiring strategy is on top. You can learn more about our <a href="http://www.google.com/jobs/working.html">hiring and working philosophy</a>.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21224994-114172562303920024?l=googleresearch.blogspot.com' alt='' /></div>]]></content:encoded>
			<wfw:commentRss>https://googledata.org/google-research/hiring-the-lake-wobegon-strategy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
