The 7.1 release of the DNN Platform has introduced greatly-improved 404 handling, which appears to have been well received by the DNN Community in general. I have already posted about the new 404 Page Not Found handling in DNN, but something that was not handled in that particular post was knowing what URLs have been requested on your site and result in 404 errors.
Any 7.1 site using Advanced URL mode captures any 404 Errors and logs them to the DNN Event Log, under the specific Event Log type key of ‘Http Error Code 404 Page Not Found’.
The following is an example of what you might see in your Admin->Event Viewer page, after filtering the list for the 404 Errors:
TabId:
PortalAlias: dnndev.me/dnn710
OriginalUrl: /dnn710/deadpage
Referer: http://somepage.com/somepage
Url: http://dnndev.me/dnn710/deadpage
UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36
HostAddress: 192.168.0.106
HostName: 192.168.0.106
Server Name: dnndev
The event entry is self-explanatory – you have the Portal Alias that was identified for the 404, the Original Url as requested (which relates to the Request URI, which is minus the ‘host’ or domain name), the Referer field (if the request came from a click in another location) and the URL itself.
The User Agent field shows the type of device that was used to request the URL – this is an important one because it will often reveal whether it is a search bot finding the 404 pages or whether it is regular visitors finding the errors.
The reason the ‘TabId’ value is empty is because there was no matching page found, so the TabId cannot be determined. In some 404 errors controlled by custom module code, it will be possible for the TabId to be known but still show a 404 error. An example of this would be ensuring that a non-existent blog post returned a 404 error if the associated blog content could not be found.
Extracting a list of URLs that returned a 404
A common requirement for analysing the 404 errors on a site is getting a complete list of the URLs that have resulted in a 404. This information is contained within the DNN Event Log, but it’s not immediately clear how to extract this data.
With that in mind, I wrote this piece of SQL which can be copy/pasted into the Host->SQL page of your DNN site, providing you with an easy-to-copy list of data in table format. You can, of course, also run this directly in a SQL query tool such as SQL Server Management Studio, though you will have to replace the ‘{objectQualifier}’ and ‘{databaseOwner}’ fields with the relevant values for your site.
Select LogPortalId, LogPortalName, LogCreateDate
, convert(xml, logProperties).query('data(/LogProperties/LogProperty/PropertyValue[../PropertyName="Url"])') as Url
, convert(xml, logProperties).query('data(/LogProperties/LogProperty/PropertyValue[../PropertyName="Referer"])') as Referer
, convert(xml, logProperties).query('data(/LogProperties/LogProperty/PropertyValue[../PropertyName="UserAgent"])') as UserAgent
from {databaseOwner}{objectQualifier}eventLog
where LogTypeKey = 'Page_not_found_404'
The above SQL code actually shows a technique which is very useful for extracting information buried in the DNN Event Log. The DNN Event Log is a great solution to a generic location for a wide variety of Event Log types because it is stored in XML format. That does make it difficult to extract the information through a SQL Query tool. The solution is to convert the column back into an XML field, and then use xPath queries to retrieve the specific values required.
Of course this query contains no ‘Where’ clause to filter the results – you can easily add a condition to pull back the data for a specific portal, or for a specific date/time range. You can also retrieve information based on a where clause for a specific ‘LogProperty’, such as retrieving all the 404 errors found by the Googlebot User Agent as an example. However, care should be taken when writing these queries, as the Event Log can be quite large, and database indexes do not exist on any of the fields I have mentioned. For large, active sites, it would be prudent to restore a backup of the live database to a local server for digging through this type of analysis. By doing this, you limit the amount of database capacity taken up by running large queries and limit the potential for slow responses to site visitors.
The results from the above query on a test site returned the following table, which is much more useful for aggregating large amounts of data than the native XML format of the EventLog table.
0 |
dnn710 |
7/22/2013 9:22:56 AM |
http://dnndev.me/dnn710/deadpage |
|
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36 |
0 |
dnn710 |
7/22/2013 9:26:35 AM |
http://dnndev.me/dnn710/deadpage |
http://somepage.com/somepage |
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36 |
0 |
dnn710 |
7/22/2013 9:26:12 AM |
http://dnndev.me/dnn710/deadpage |
|
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36 |
Conclusion
I hope this is useful both for describing how to extract information relating to 404 errors, but also in describing the easiest way to drill down into Event Log data for any type of event. Let me know via the comments if you have any questions relating to the technique or the 404 Error logging.