Everyone is going to love Search in DNN 7.1. It was rebuilt from the ground-up to provide fast, accurate, efficient, secure and locale-aware access to content in just a few keystrokes. Check this out!
History
The platform has had Search capabilities for a long time. This was based on a module crawler which would iterate over the various pages and modules in the site and index the content.. Modules needed to implement a specific interface (ISearchable) to store their content for Searching. They relied on SQL Server as the datastore. While the old Search was functional, it lacked speed, accuracy, and relevance.
The commercial editions utilized a different search engine with the capability to index URLs and Files. The URL Crawler was fundamentally different from the platform search in that it relied on the parsing of HTML pages and following links. The File Crawler indexed the content of files (Microsoft Office Documents, PDF’s, etc.).
Objectives of the New Search
To start, supporting two different Search architectures across editions was very taxing and didn’t allow us to showcase DNN in the most favourable light. Differentiation at the feature level makes a lot more sense. Moreover, it was time to enhance the feature set given the importance of Search to so many activities.
We also needed a more efficient Search API for modules. For example, we added the concept of deltas where modules can recognize changes in content since the last Indexing run, as opposed to indexing all of the content all of the time. This change was implemented in HTML Module in the Core and all the DNN Social modules. We look forward to other modules in the Community and the DNN Store following suit soon. There will be backwards compatibility for modules implementing just the old Interface; however, those down-level modules won’t be able to take advantage of the new features such as deltas, tagging, ranking, permissions, etc. until they change over to the new API.
What’s New
The new Search has many useful built-in features in the platform, as well as a number of useful additions in the commercial editions and DNN Social (mentioned at the end). Below are highlights of the new cool features which are available in all DNN product editions:
Preview
Results are shown in real-time as you type, showing up as soon you pause in typing. Partial words can also be searched. Previews are designed to help refine your keywords so you can find things with fewer keystrokes. Previews also contain direct links to the actual content so you can access them directly from a single place.
Speed
Results are returned blazingly fast. Thanks to the new Lucene.Net repository, there is no additional overhead of SQL or out-of-process calls. Lucene.Net is a best-of-class NoSQL database designed from the ground up for very fast search.
Site-Scoped
Results are scoped by default to a specific site’s content. In the commercial editions it is also possible to expand the scope to include results for other sites that are part of a site group.
Locale-Aware
Internalization and localization have been a key priority for the DNN Platform over the last couple of years. We provide the Platform in six languages in the box, starting with 6.2. Search is no different. Content can be indexed and found based on language/culture. For example, a French/French page can only be found when search is executed from a French/French page; at the same time, culture-neutral pages can be found from any language.
Efficient Indexing
Thanks to Delta-based module indexing, new content can be now be indexed and searched quickly.
Ranking and Relevance
Certain Index components such as title, tags, keywords, etc. are ranked higher than general body of content. This allows users to find content based on their relevance. (more on this in a future blog).
Accuracy
We support most of the Lucene syntax such as boolean, wildcard or fuzzy search. Any combination of these can be used for better accuracy.
Filter by Tags
One or more tags can be specified to restrict searches
Filter by Content Modification Time
Results can be restricted by the time they were modified, with several options provided out of the box.
Personalized
We’ve gone to great lengths to make Search personalized by design. For example, we perform security trimming on search results to ensure users can only access the content for which they have permission.
Item Level Permissions
Modules that manage collections of content sometimes have requirements where each item in the collection has its own unique permissions to define who is able to view the item. The new search interface has the ability for a developer to provide view permissions for each item which ensures that the search results will be trimmed appropriately and 100% personalized.
Highlighted Results (aka “Hit Highlighting”)
Search keywords are highlighted in the results. In fact, synonyms are highlighted as well.
Near-Real-Time Results
Search can be performed while indexing is in progress. The results may be just a tad behind (which is why we call it Near-Real-Time); however there is no need to wait for the entire indexing operation to complete.
Synonyms
Site Admins can define Synonym Groups containing words with the same meaning. For example, searching for the terms “DNN” and “DotNetNuke” could be made to yield DNN (vice versa). Synonyms can be configured per site.
Ignore Words
Site Admins can specify specific words (e.g. a competitor’s name or profanity) to skip during indexing. We provide Standard English “stop words” (a, an, the, not, etc.). These words can be configured per site, per language.
Page Level Metadata
Page level attributes such as name, title, description, and keywords will be indexed by default which will allow you to easily located in your site information architecture. Similarly, module-level attributes such as the header and footer will also be indexed by default.
HTML Tag Attribute Support
Content stored in HTML tag attributes such as ALT and TITLE is now indexed by the search crawler. This allows you to more easily locate resources such as images, video, and links on your site.
New APIs for Modules
In order to index module-content, modules need to implement a very simple abstract class with just one method. Class is named ModuleSearchBase and method is IList GetModifiedSearchDocuments(ModuleInfo modInfo, beginDate)
Module can also replace their module-specific searching by using a new API named ModuleSearch (more in a future blog).
Extensible
The architecture of the new search is based on the generic concept of crawlers which are designed to harvest specific types of content and feed them into the Lucene.Net repository for indexing. There is no limit to the number of specialized crawlers that can be created and utilized as part of search.
Still More Capabilities in the Commercial Edition
File Crawling
Microsoft Office and PDF documents are indexed using the File Crawler. Starting with DNN 7.1, we have eliminated the need to install PDFBox or the bloated IKVM library for .Net. This should be a big relief for admins! J
Instead, we use standard iFilters to index Office, PDF and other document-types. MS Windows uses iFilters as well in its Desktop Search, so it’s a well proven technology to efficiently handle various document types. Note that PDF indexing will require installation of Adobe IFilter outside of DotNetNuke (more in a future blog).
URL Crawling
The URL Crawler was modified to leverage the new architecture. Our expectation is that module developers will start to use the new interface for Search as it allows site owners to get the optimal benefit from their content. This means that most module content will eventually be indexed by the Module Crawler. However, the URL Crawler will still play an important role for indexing all content that is not part of a module. This includes content that is part of your site skin, navigation, etc… as well as content that may be located on other sites that you want to federate in your search results.
Search in DNN Social
All the DNN Social modules (Answers, Ideas, Blogs, Social Events, Discussions, etc.) have been modified to leverage the new Search APIs.
As you can see, we’ve put a lot of effort into delivering a super robust, fast, and functional search. We’ve enabled many of its new capabilities in the community edition and deliver even more in the commercial products. We look forward to hearing your feedback as you take it for a spin. Happy searching!!