Eight Ways to Help Google News Better Crawl Your Site
February 11th, 2009 | Published in Google News
Posted by Abe Epton, News Online Operations Team
From time to time, publishers ask us what they can do to improve their listings in Google News. The following are eight of the most frequent, and useful, pieces of advice we give out. Why eight? Because at Google, we love powers of 2.
* Keep the article body clean
For various reasons, when crawling an article, Google News checks to make sure it can find the article body. If your article body is broken up by
tags, ads, sidebars or other non-article content, we may not be able to detect the actual article body, and reject your article as a result. In addition, if you place the beginning of your article's body near the title in the HTML, we'll be more likely to extract the correct title and snippet.
* Make sure article URLs are permanent and unique
If you reuse article URLs, our system may have difficulty crawling and categorizing your stories. In addition, make sure your article URLs have at least three digits that don't resemble a year (for example, 5232 is ok, but 2008 is not.) You can get around this requirement by submitting your articles in News Sitemaps. Also, please note that session IDs can confuse our crawler, and we may not realize that two distinct URLs actually point to the same page. You can learn more about some of these requirements here.
* Take advantage of stock tickers in Sitemaps
Google News Sitemaps allow publishers to specify stock ticker symbols for companies mentioned in individual articles. Using these symbols helps us better identify the subjects of your articles. You can read more about the format we use for this data here.
* Check your encoding
We occasionally see articles that declare themselves to be encoded in one format (say, UTF-8) and are actually encoded in another (say, ISO 8859-1). Don't do this. It hurts us.
* Make your article publication dates explicit
In order to help our crawler determine the correct date, please make the actual publication date of your articles explicit. You can do this by placing the article date and time in the HTML, between the title and the body. Also, you can remove other dates from the HTML of the article page, and add the required tag to articles in your News Sitemap. Dates on article pages can be in most common formats, but for sitemaps, we ask that you use the W3C format; e.g. 2008-12-29T06:30:00Z.
Note that the article times and dates displayed on Google News reflect the time at which we originally crawled the articles, and may not be the same as the publication date.
* Keep original content separate from press releases
If your site produces original content and distributes press releases that you'd like us to crawl, make sure to separate your original news content from your press releases by creating two different sections on your site. As you may know, Google News labels press releases distinctly in order to alert our users that the article they're about to read is a press release. If your original news sections have links to press releases, adding the rel="nofollow" attribute to all links that point to your press release articles will ensure that they're labeled correctly. You can learn more about this attribute here.
* Format your images properly
To help Google News identify your images and crawl them along with your articles, use fairly large images with reasonable aspect ratios and descriptive captions. Make sure to place them near their respective article titles on the page and make the images inline and non-clickable. Images in the JPEG format are more likely to be crawled correctly.
* Article Titles in Google News
In order for Google News to crawl the correct titles for your articles, make sure the title you want appears in both the title tag and as the headline on the article page. In addition, don't hyperlink the headline on the article page - after all, your reader is already there! And it's always a good idea to have links that point to your articles use the article title as anchor text.
If you found these suggestions helpful, you might also want to check out our more general Webmaster Guidelines. The Webmaster Guidelines aren't necessarily specific to Google News, but much of the wisdom you'll find there can help make your site Google News-friendly. Our Publisher Help Center contains lots more information about many of these topics. And you can always check out the Google News Help Forum to give us feedback on these suggestions, and share other tips and advice with webmasters and News users.
From time to time, publishers ask us what they can do to improve their listings in Google News. The following are eight of the most frequent, and useful, pieces of advice we give out. Why eight? Because at Google, we love powers of 2.
* Keep the article body clean
For various reasons, when crawling an article, Google News checks to make sure it can find the article body. If your article body is broken up by
tags, ads, sidebars or other non-article content, we may not be able to detect the actual article body, and reject your article as a result. In addition, if you place the beginning of your article's body near the title in the HTML, we'll be more likely to extract the correct title and snippet.
* Make sure article URLs are permanent and unique
If you reuse article URLs, our system may have difficulty crawling and categorizing your stories. In addition, make sure your article URLs have at least three digits that don't resemble a year (for example, 5232 is ok, but 2008 is not.) You can get around this requirement by submitting your articles in News Sitemaps. Also, please note that session IDs can confuse our crawler, and we may not realize that two distinct URLs actually point to the same page. You can learn more about some of these requirements here.
* Take advantage of stock tickers in Sitemaps
Google News Sitemaps allow publishers to specify stock ticker symbols for companies mentioned in individual articles. Using these symbols helps us better identify the subjects of your articles. You can read more about the format we use for this data here.
* Check your encoding
We occasionally see articles that declare themselves to be encoded in one format (say, UTF-8) and are actually encoded in another (say, ISO 8859-1). Don't do this. It hurts us.
* Make your article publication dates explicit
In order to help our crawler determine the correct date, please make the actual publication date of your articles explicit. You can do this by placing the article date and time in the HTML, between the title and the body. Also, you can remove other dates from the HTML of the article page, and add the required tag to articles in your News Sitemap. Dates on article pages can be in most common formats, but for sitemaps, we ask that you use the W3C format; e.g. 2008-12-29T06:30:00Z.
Note that the article times and dates displayed on Google News reflect the time at which we originally crawled the articles, and may not be the same as the publication date.
* Keep original content separate from press releases
If your site produces original content and distributes press releases that you'd like us to crawl, make sure to separate your original news content from your press releases by creating two different sections on your site. As you may know, Google News labels press releases distinctly in order to alert our users that the article they're about to read is a press release. If your original news sections have links to press releases, adding the rel="nofollow" attribute to all links that point to your press release articles will ensure that they're labeled correctly. You can learn more about this attribute here.
* Format your images properly
To help Google News identify your images and crawl them along with your articles, use fairly large images with reasonable aspect ratios and descriptive captions. Make sure to place them near their respective article titles on the page and make the images inline and non-clickable. Images in the JPEG format are more likely to be crawled correctly.
* Article Titles in Google News
In order for Google News to crawl the correct titles for your articles, make sure the title you want appears in both the title tag and as the headline on the article page. In addition, don't hyperlink the headline on the article page - after all, your reader is already there! And it's always a good idea to have links that point to your articles use the article title as anchor text.
If you found these suggestions helpful, you might also want to check out our more general Webmaster Guidelines. The Webmaster Guidelines aren't necessarily specific to Google News, but much of the wisdom you'll find there can help make your site Google News-friendly. Our Publisher Help Center contains lots more information about many of these topics. And you can always check out the Google News Help Forum to give us feedback on these suggestions, and share other tips and advice with webmasters and News users.