Tech Tip: Collection Math
October 12th, 2006 | Published in Google Enterprise
We were talking today (well, actually we were IM'ing over Google Talk in response to an email, but thats beside the point) that although there is a lot of interesting information that finds its way to our blog, we haven't been using it as a forum to share with everyone some technical tips and tricks we come across for our enterprise products. So this will mark the first in an ongoing series of "tech tips" for the Google Search Appliance, Google Mini, Google Earth, Google Apps, etc.
One powerful feature of the Google Search Appliance is the ability to create multiple collections. Collections are logical views of information in the index, as defined by URL patterns. This allows you, for example, to index the entire contents of your intranet, but then divide it up into logical groups of content. One approach may be to divide it up functionally, like a collection for Finance, one for HR, Engineering, Sales, Support, etc. However, as you start to break down your content into logical groups, its often necessary to give any one group of users search capabilities across multiple of those collections at the same time. You might want to make it so a sales person not only searches the 'sales' collection, but also searches the 'marketing' collection as well. Or there might be some general content like corporate policies and holiday schedules that should be available to everyone.
For this, you could either create lots of unique collections with duplicate rules, but that requires more ongoing maintenance. Luckily, the Google Search Appliance supports the use of logical AND and OR operators on the collection parameter.
To specify which collection you want to search over, you set the site parameter on the GET request. The following is a simple GET request of the Google Search Appliance where the collection specified is one named 'all_content':
http://search.corp.mycompany.com/search?q=query+stringNow, what if we wanted to do a query and ask for anything in either the 'sales' or 'marketing' collection? You can use the boolean OR [|] operator on the site parameter. So your GET request would be:
&site=all_content
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontend
http://search.corp.mycompany.com/search?q=query+stringWhat if you wanted to only return information that was in both the 'engineering' and 'support' collections? You can use the boolean AND [.] operator on the site parameter:
&site=sales|marketing
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontend
http://search.corp.mycompany.com/search?q=query+stringUsing the boolean AND [.] and the boolean OR [|] operators can make working with collections more powerful as well as significantly lowering ongoing maintenance efforts as content changes and collection definitions evolve. Give it a shot!
&site=engineering.support
&client=default_frontend
&output=xml_no_dtd
&proxystylesheet=default_frontend