DNN Community Blog

Do you have useful information that you would like to share with the DNN Community in a featured article or blog? If so, please contact community@dnnsoftware.com.

 


Integrating with Search – Introducing ModuleSearchBase

7.1 saw the best Search ever in the DNN Platform; have a look into my previous blog to learn all about the new Search. This blog series is dedicated to provide more insight to Developers for easy Search integration.

This blog is organized as FAQ for ease of reading and answering common questions for the module developers. The key takeaway is that ISearchable has been deprecated and replaced with ModuleSearchBase with lots of awesomeness.

Let’s start with knowing a bit more about Site Crawler.

Site Crawler

Is Site Crawler new?

No. It’s the CE’s “Search Engine Scheduler”, renamed to “Search: Site Crawler”

clip_image002

What’s new in Site Crawler?

Pre 7.1, it called every module that implemented ISearchable to obtain Search information and index them for searching by users. It also did RSS Syndication.

Now it does more – it has Tab Indexer, Module Metadata Indexer and Module Content Indexer. Below is summary from Schedule History

clip_image003

Is Site Crawler Site specific?

No. The schedule job goes through all the Sites present.

What is Tab Indexer?

Tab Indexer is part of Site Crawler. Its job is to collect information about each and every page defined in all the Sites including the Host ones. It stores page name, title, description, keywords, taxonomy tags, etc.

What is Module Metadata Indexer?

Very similar to Tab Indexer, except it’s at a module level.

What is Module Content Indexer?

This is also part of Site Crawler and the most important Indexer. It is responsible for getting content from modules. It calls modules that implement ISearchable or the new ModuleSearchBase.

ModuleSearchBase

What is ModuleSearchBase?

It’s the new Interface (actually an abstract class) that module developers need to implement to better integrate with new Search.

Is ModuleSearchBase better than ISearchable?

Yes, it is more efficient as it has the concept of Deltas. ISearchable required modules to provide all their content all the time. ModuleSearchBase only asks difference in content since the last run.

Where should I implement ModuleSearchBase?

It should be implemented in the BusinessControllerClass in the module’s manifest. Below is an example of manifest from Html module. As always, you must provide “Searchable” as one of the SupportedFeature.

SNAGHTMLc3b409

Do you have an example of ModuleSearchBase implementation?

Indeed, have a look into the Html module.

image

C# does not allow multiple inheritance of base classes, what can I do if my BusinessControllerClass is already derived from another base class?

Ideally you should keep BusinessControllerClass clean and not inherit from any other base class. In this case you’d need to remove the other base class and inherit from ModuleSearchBase instead. You can continue to inherit any number of Interfaces though.

How many methods do I need to implement in ModuleSearchBase?

Just one - GetModifiedSearchDocuments.

Backwards Compatibility

Do I have to implement ModuleSearchBase, can I not stick with old ISearchable?

Well you can continue to use ISearchable; we have made sure that the new Search is backwards compatible with ISearchable. You won’t be able to take advantage of Deltas though. You will also be missing other cool features as localization, granular security trimming, etc.

What happened to SearchItemInfo?

It’s still present to support ISearchable. We map most of the properties of SearchItemInfo into the new SearchDocument.

Which properties of SearchItemInfo are not ported over?

HitCount and ImageFileId.

GetModifiedSearchDocuments

What does this method return?

Essentially a collection of SearchDocuments. It should return SearchDocuments for new, changed and deleted content for your module.

What parameters are passed to this method?

ModuleInfo and BeginDate. The BeginDate is in UTC format. You should return new, changed and deleted content from the BeginDate and the current time.

How is this method executed?

This method is called periodically by Site Crawler, which is a scheduled job. This method is called for each and every module instance that implements either ISearchable or ModuleSearchBase.

Is it possible that this method can be called more than once during a single run of the Site Crawler?

Yes. It gets called for every instance of Html module. However the ModuleInfo passed is different.

How about packages that have more than one module definition in the manifest?

If you have a package with specified SupportedFeature as Searchable, this method is called for all module definitions defined in that manifest. In these situations (e.g Blogs module), there is usually one main module and other helper module(s). You should return empty collection of SearchDocuments for helper modules and real content for the main module or else you’d be creating duplicated data in the Search index. The old ISearchable worked this way as well. You can differentiate between main / helper module by using moduledefinitionid.

How do I troubleshoot if this doesn’t get called?

It’s likely that you did not specify SupprtedFeature in the manifest or ModuleSearchBase is not implemented in the BusinessControllerClass. Best to execute this Stored Procedure to ensure your module is listed. exec GetSearchModules 9999 –replace 9999 with your portalid

Is there anything specific I need to consider while writing this method?

Yes, since this method will be executed in the context of Scheduler, you need not use the HttpContext, e.g. PortalSettings, CurrentUser, etc.

Conclusion

Module developers are highly recommended start using the new ModuleSearchBase (with 7.1+) and not use old ISearchable.

I am planning to write many more mini-blogs on topic with lots of details and insights, stay tuned Smile

Comments

Jay Mathis
Ash

In the QA section, you mentioned that if a module does not have a module title, then it won't be indexed. That is a major miss in my opinion. Often times, admins will clear out the module title field because they don't want the title displayed as content. Or, they will use a container with no Title element and they inevitably leave the module title as the default text.

Is there any way to get around this? I've been using a GoogleSearchEngine widget for search in the past and I would like to switch to the DNN search if possible, but this is a potential deal breaker if it isn't indexing modules with no title.
Jay Mathis Wednesday, September 04, 2013 10:01 AM (link)
Ash Prasad
@Jay - One quick way to deal with this problem is to have space " " as title, this will prevent the exception. In fact there is a better news here. We've gone ahead and fixed the problem in 7.1.2. Here is the issue: https://dnntracker.atlassian.net/browse/DNN-3558
Ash Prasad Tuesday, September 10, 2013 8:08 PM (link)
Horacio Judeikin
Hi,

"It should return SearchDocuments for new, changed and deleted content for your module."
I guess I'm missing something obvious. Does this mean, the module should be updated to implement logical deletes for its items? Otherwise, how can GetModifiedSearchDocuments() know which items were deleted?
Even implementing logical deletes (which I guess is not the common case for most modules out there), how to deal with items physically deleted?
How to update the index in this case (that for sure will happen)? Forcing a reindex for items created or updated since DateTime.MinValue?

Thanks
Horacio Judeikin Friday, September 13, 2013 12:07 PM (link)
Ash Prasad
@Horacio - Modules may perform soft (logical) deletes, which can be used to return those in GetModifiedSearchDocuments callback. For modules that don't implement soft deletes, site admins can periodically issue re-index (admin > search admin > re-index). You may also access some of the Search APIs from the internal namespaces InternalSearchController.Instance.DeleteSearchDocument. The only downside of using internal APIs are that they are not guaranteed to stay consistent. However, in this case many Search APIs are still under Internals namespace because they are not compatible in a Web Farm environment - I am talking just about these APIs. Search in general is fully compatible with web farm. Directly accessing these APIs from different web head may result in Lucene file locking issues.
Ash Prasad Friday, September 13, 2013 4:07 PM (link)
Ken Ingram
Ash, where does the Site Crawler store the results of the index? In dbo.SearchItem? The Schedule History says it indexed 310 items but nothing comes up in a search and there is nothing in any of the tables that start with dbo.Search* (except SearchCommonWords and SearchStopWords). I also executed "exec GetSearchModules 0" and see the module I setup with ModuleSearchBase. Just need some more troubleshooting tips please.
Ken Ingram Monday, September 16, 2013 10:38 PM (link)
cathal connolly
@Ken since 7.1.0 the search has been lucene based so you don't check the database, the data is stored in the lucene index in app_data/search - you can use a tool such as http://code.google.com/p/luke/ to see what the index contains
cathal connolly Tuesday, September 17, 2013 11:13 AM (link)
Ken Ingram
@Cathal, thank you for the reply. I downloaded Luke but still cannot find the index. There is no "search" folder under app_data. Is there a configuration file somewhere that shows where the indexes are stored?
Ken Ingram Tuesday, September 17, 2013 1:03 PM (link)
cathal connolly
@Ken -there should be e.g. when I use Luke I point it to C:\websites\dnn720test\App_Data\Search (it combines various files from that directory to generate the index). If you do not have a Search folder then it would suggest that your permissions are incorrect and the application cannot generate it, so it is indexing documents but failing to write them to the index (as it cannot create them). Please ensure you have "modify" permissions for your site from the root folder down.
cathal connolly Tuesday, September 17, 2013 1:14 PM (link)
Stephen Lim
If a site is content localized (suppose we have the same content in English and French), what is the suggested way to submit the title to the indexer during GetModifiedSearchDocuments method call? Do we return 2 SearchDocuments entries each with its own localized title or does the scheduler call the GetModifiedSearchDocuments repeatedly for each language in the portal?
Stephen Lim Friday, May 30, 2014 8:50 AM (link)
Ash Prasad
If you have the same content in both languages, it is recommended to pass culture as string.empty in the SearchDocument that you return in GetModifiedSearchDocuments call. This will ensure that content will be found in a language-neutral way, meaning both English and French pages should be able to find it.
Ash Prasad Friday, May 30, 2014 11:46 AM (link)
Mike Savely
Ash, we are having problems with the DNN Search Results module filtering results from a module using ModuleSearchBase. The module we created is being indexed and the DNN Search Results module shows results from our module. The problem is when the Source link for our module on the Search Results page is clicked. The search results are not filtered down to our module. If the Source link is clicked for Pages or Documents (DNN sources), the search results are filtered to those sources.
Is there anything we need to implement in our module for the Search Results module to properly recognize our module as a filterable source?
Mike Savely Thursday, July 31, 2014 2:07 PM (link)

Comment Form

Only registered users may post comments.

NewsArchives


September 2014 (12)
August 2014 (25)
July 2014 (17)
June 2014 (10)
May 2014 (6)
April 2014 (9)
March 2014 (3)
February 2014 (4)
January 2014 (8)
December 2013 (5)
November 2013 (2)
October 2013 (9)
September 2013 (10)
August 2013 (8)
July 2013 (4)
June 2013 (8)
May 2013 (13)
April 2013 (2)
March 2013 (7)
February 2013 (7)
January 2013 (10)
December 2012 (6)
November 2012 (20)
October 2012 (12)
September 2012 (27)
August 2012 (29)
July 2012 (22)
June 2012 (17)
May 2012 (23)
April 2012 (24)
March 2012 (27)
February 2012 (21)
January 2012 (12)
December 2011 (18)
November 2011 (20)
October 2011 (27)
September 2011 (17)
August 2011 (18)
July 2011 (45)
June 2011 (22)
May 2011 (23)
April 2011 (19)
March 2011 (36)
February 2011 (19)
January 2011 (22)
December 2010 (29)
November 2010 (37)
October 2010 (32)
September 2010 (43)
August 2010 (46)
July 2010 (37)
June 2010 (46)
May 2010 (29)
April 2010 (38)
March 2010 (27)
February 2010 (33)
January 2010 (34)
December 2009 (13)
November 2009 (20)
October 2009 (29)
September 2009 (18)
August 2009 (29)
July 2009 (19)
June 2009 (18)
May 2009 (23)
April 2009 (16)
March 2009 (13)
February 2009 (20)
January 2009 (25)
December 2008 (25)
November 2008 (29)
October 2008 (34)
September 2008 (33)
August 2008 (36)
July 2008 (31)
June 2008 (25)
May 2008 (26)
April 2008 (33)
March 2008 (31)
February 2008 (24)
January 2008 (18)
December 2007 (27)
November 2007 (51)
October 2007 (24)
September 2007 (32)
August 2007 (24)
July 2007 (20)
June 2007 (28)
May 2007 (27)
April 2007 (24)
March 2007 (47)
February 2007 (21)
January 2007 (41)
December 2006 (21)
November 2006 (16)
October 2006 (24)
September 2006 (36)
August 2006 (30)
July 2006 (31)
June 2006 (37)
May 2006 (13)
April 2006 (13)
March 2006 (18)
February 2006 (20)
January 2006 (13)
December 2005 (6)
November 2005 (15)
October 2005 (15)
September 2005 (16)
August 2005 (7)
April 2005 (1)
March 2004 (4)
February 2004 (6)
January 2004 (1)
November 2003 (4)
October 2003 (22)
September 2003 (22)
August 2003 (15)
July 2003 (14)

Copyright 2014 by DNN Corp | Terms of Use | Privacy | Design by Parker Moore Design