With the 3.0 release of DotNetNuke, way back in March of 2005 searching was implemented in the project, after a hiatus in the 2.* releases.
Since then, not much has changed with the search, though it is still a very mysterious system for most users and developers. I hope to clear up some of the mystery with this blog post.
As a DNN site administrator you most likely won't ever worry about how the searching works, until users come to you and ask why they aren't seeing results they might expect. This blog post will explain how search works so you can answer their questions.
A few things to note
1. The Search implementation within DNN is pretty basic, it searches for the number of times a term is found within a body of "text", more on this later.
2. Each module differs in how, and if, they implement the necessary interfaces to interact with the core DNN Search provider. Not all modules implement ISearchable, meaning they won't support the core search.
3. Each module chooses what content it provides to the Search provider to be indexed, it might pass everything associated with a particular object, or it might pass only specific values from an object, this would possibly limit the effectiveness of a search for individual modules.
The scheduler:
The Search indexer runs on a schedule, defined under the Host/Schedule options. Most cases I've seen have the indexer set to run every 30 minutes. If you're making changes to content within your modules and expecting them to show up in the results immediately this is not very likely. To get around this, you can change the time between executions of the indexer, though I wouldn't recommend having the indexer run too frequently if you have a lot of content on your website, the more it runs the more times the database will get hit by each module to load the content.
One way to force your content to be indexed is to go to the Schedule page under the Host menu, edit the Search Indexer task, disable and save the task. Edit the task again and enable, this should force the indexer to fire immediately. If this isn't getting your content indexed as you would expect, you can clear the Search tables in the database and have them repopulate completely the next time the indexer runs, I made a blog post on how to do this quite a while ago (http://weblogs.asp.net/christoc/archive/2006/06/26/DotNetNuke-Daily-Tip-_2300_3-6_2F00_26_2F00_06-Clear-Search-Tables.aspx)
There is also a "re-index" option on the Host/Search Settings page, though personally I've always found it to be a bit flakey and I take the above approach to forcing my content to reindex.
A basic overview of how the indexer works:
The indexer job fires, and makes a request to each of the modules on the website that support the ISearchable interface. These modules return a collection of SearchInfoObjects, assuming the modules have any content to return. How the modules populate these searchinfoobjects is completely up to the modules. As a developer it is important to populate these searchinfoobjects with unique SearchKey values, otherwise the indexer will log an exception.
The search indexer then parses through each of these objects that are returned from each module. The indexer checks to see if this object has already been indexed by checking the last updated date on the object and comparing it to the last updated date in the SearchItem table. If they differ the indexer will update the indexing of this object.
If the indexer finds an object in SearchItem that wasn't returned from a module it presumes that item has been deleted so it deletes all indexing for this item. This is a key item that most module developers miss, if you don't pass back an item DNN assumes it no longer exists and it will get removed from the index. This functionality has changed in Cambrian, stay tuned for more Cambrian blog posts this year.
The indexing of content basically consists of parsing out individual words in an item and storing these words in a SearchWord table, then creating a reference for each word in the SearchItemWord table.
Search results process:
When you search for an individual term DNN will look to see if that word exists in the SearchWord table, if so it will then look to see what "items" in SearchItem have this word. It will count the number of times a particular word is found in an item, and return that item as a search result. There is a relevance number passed back to the search results, this number is usually 4 characters. This relevance number is built in this manner.
For each time a word is found in a particular item the count is incremented, 1001 would mean that your search term was found once in an item. 1050, would mean that you search term was found 50 times in an item.
If you were searching for multiple search terms the only difference in the process is how the relevance number is built. If you searched for two terms, and both terms were found one time in an item, the relevance number would be 2002. If the two search terms were found a total of 3 times (once for term 1, and twice for term 2) the relevance would be 2003. The key information here is this.
For the first number (X) in Relevance (X000), X is the number of search terms found in a particular item, if you searched for 3 terms, and all three terms were found, X would be 3.
For the other three numbers (YYY) in the relevance (XYYY), YYY is the count of the number of times any of the search terms that were found in an item.
You might have searched for 3 terms in a particular search, and returned a relevance number of 3099. This tells you that all three terms were found in this particular result, but does not provide you any more insight into how many times each individual term was found, the first term might have been found 97 times and the second two terms may have been found one time each, 97+1+1 = 99, or each term might have been found 33 times, with the basic nature of the core DNN search provider you don't get this information.
Hopefully this post has provided you a bit more insight into how the core searching functionality works. If you're interested in learning more open up the solution, because DotNetNuke is open source you can learn a lot by opening up the code and stepping through some of the functionality.
A note of thanks to Charles Nurse for reviewing this blog post for accuracy and providing me some feedback before I posted it! :)