The new Search starting DNN 7.1 uses Lucene as its indexing and querying engine.
Lucene is a file-based NoSQL database. You need a specialized Java tool "Luke" to dig into this database. Luke is mostly used to troubleshoot issues with Search, especially when you want to know how Lucene stores your content internally.
Downloading Luke
Download the “lukeall-3.5.0.jar” file from https://code.google.com/p/luke/downloads/list
Running Luke
Provided you have the current version of Java installed, you
should be able to launch it by just double-clicking the downloaded jar file.
Pointing to Search Folder
Ensure you point to the App_Data\Search folder in Luke. Also ensure you have checked “Open in Read-Only mode” checkbox prior to clicking OK. This will allow your DNN Site to continue to use this database while you use Luke.
Lucene Database
Here is how the typical Lucene files look like:
The write.lock file is created by Lucene to ensure that only
one process can write to these files. The best way to delete this file is to first
recycle the app pool (deletion is not recommended in general).
Initial View
A few things jump out from the very initial view itself.
Below are some of the important fields:
Number of documents
Lucene works at a Document level. DNN converts it’s entity
into one or more document. Every page in DNN becomes one document each in
Lucene. Likewise, every module becomes one document each as well. Each item within
each module (e.g. an Article) in turn become more documents. In short, everything becomes
document in Lucene - it's a document store.
:Number of Documents" shows the total number of such documents (excluding the
ones that are deleted). The screenshot shows 167, which is a very small number from a brand
new Evoq Content installation. Your actual website will have this number in
thousands.
Number of Fields
Every Document consists of one or more fields. Think of a document as a database table, then a fields are the columns. These fields support common "type" such as numeric, boolean, string, etc. The beauty of a typical NoSQL
database is that its schema is very flexible. It allows two documents to have very
different set of fields.
The "number of fields" shows the total number of such unique fields in it's documents.
Number of terms
These are keywords extracted by Lucene from the texts provided
by DNN.
Deleted Documents
Deleted documents in Lucene are not removed immediately, they are marked as deleted to start with. Physical removal is done by a
difference process. The number of deleted documents is shown in “Has deletions?
/Optimized?” label. The number in parenthesis next to Yes is the number of
deleted documents. In this example it is 5.
So the total number of documents in this database is 167 + 5
= 172.
Documents View
The second tab allows going through the various documents
present in the database.
The arrows allow you to go back and forth. You can type
a number between 0 and the maximum number of documents (e.g. 171 in this case) here and click enter to see the content of the document directly.
As noted above, this database has 172 items. Giveb ids are indexed
starting 0. The id for the first document is 0, second one is 1, and so on. In
this example you can go through documents id 0 till 171 (which is 172 -1).
Seeing a document.
Typing a document id or going back and forth will list the
contents of the document.
The field names are listed in the bottom pane with the value
as stored in Lucene. Please note that Lucne might convert texts to its stemmed
version. Even though your site had jumped, jumping, and jump as texts in one
document, it might just convert all the three to ‘jump’. This process is called
as ‘stemming’
Running Queries
Luke has limited support for running custom queries. Follow
the steps below:
- Go into Search tab
- Type the keyword you want to search
- Ensure the “….KeywordAnalyzer” is the analyzer
- Select field in which you want to search within
- Click on Search button
- Results should show at the bottom pan
- Clicking on any of the rows in the result will take you to
that specific document in the Documents tab
Alternate way to query
At times, the Search tab may not find anything, and could be
a very frustrating experience. Don’t loose heart, there is an alternate way:
- Go to the Overview tab
- On the bottom-left pane, select the field you think your keyword resides, title in this example
- Click on “Show top terms”. Adjust the “number of terms” drop as needed.
This will take you to the Documents tab and list
all the documents that contains this keyword.