May 17th, 2007 | Published in Google Custom Search
If you've created a Custom Search Engine you've likely encountered the term 'URL patterns': the method for specifying sites and pages to include in a search engine. You can create a Custom Search Engine without ever giving URL patterns much thought, but there's added control that comes with understanding how they work and how best to use them. To that end, here are a few tips on creating useful URL patterns.
The simplest type of pattern is the URL of a specific page. You can use an asterisk (or star, in techie vernacular) as a wild card in your pattern to specify a swath of URLs by identifying their common characters. Here's how it works: rather than add each page of www.site.com to your search engine, you can merely add www.site.com/* and we will include every URL that begins with www.site.com. Similarly, rather than individually adding each subdomain of site.com, you can use the star to include them all: *.site.com will apply to www.site.com, server.site.com, etc.
Because most users adding sites to their search engines expect the subpages to be included, we've made the addition of the star the default behavior. This is what we mean in the Sites tab of the control panel when we say 'Include all pages whose address contains this URL.' If you leave that option selected, we will add a star to the end of whatever URL you provide. Additionally, if your URL is just a domain (like site.com/), not including a sub-domain (like www.site.com/ or server.site.com/), we will add a star at the beginning. This means:
- site.com/ becomes *.site.com/*
- server.foo.com/ becomes server.foo.com/*
- othersite.co.uk/ becomes *.othersite.co.uk/* (yep, we handle two level top-level domains correctly)
If you'd like to control exactly where stars get added, you can select 'Include just the specific page or URL pattern I have entered.'
Another way to use patterns is to identify and distill common characters among a set of pages you'd like to include in your index. For example, if I know that all car review pages on my favorite consumer reviews site contain the words 'car' and 'reviews' in the URL, I can create the pattern www.reviewsite.com/*reviews*cars and include these pages. (I just need to be careful that only the pages I want contain these terms, or I may include more than I bargained for.) These types of patterns are especially useful when creating search refinements.
There is also one more application of the star, which is before a top-level domain like .edu or .ca. One use of this pattern is for when you'd like to refine a search by country: for example, you can create a pattern like *.ca and refine your search to Canadian web sites only. That said, you cannot otherwise insert a star into the second-level domain, such as www.*.com or www.dogs*.com. For a few more concrete examples of URL patterns check out our documentation.
The careful creation of URL patterns allows you greater control over the content of your Custom Search Engines, and can be particularly helpful when breaking up a site's content for refinements. Patterns may also help reduce the overall number of sites in your Custom Search Engine account, so look for redundant URLs that can be included more economically by the use of, shall we say, a little star power.