Murphy’s law has it that real issues in DotNetNuke (core. modules, skins) don’t manifest themselves until you’ve rolled out in production. We can test all we want, but the fact of the matter is that this has become a highly complex eco-system with many interacting bits and it is next to impossible to test for all scenarios. And I’m pretty sure the kind folks at the Corp’s helpdesk will agree with me on this as well. So let’s not kid ourselves: sh*t happens. It’s our ability to deal with it that separates the good from the bad (or experienced from the inexperienced), IMHO. It is also the reason we have something called dogfooding. This is the last phase of testing where you use your own product (dogfood) in your own production environment. This is after you’ve tested it on a staging server of course. It’s rare, but sometimes it does give rise to issues. After I rolled out DNN 6.1.1 on my own site and had began dogfooding my own DMX 6 my server ground to a halt. Lucky for me I did this over thanksgiving so it was fairly quiet on the interwebs. The first major stumbling block in analyzing what was happening was the fact that the eventlog had exploded. I mean, my 250 Mb odd SQL database had become 4 Gb and that was just one table: the EventLog table.
First aid
As the site ground to a halt I needed to do 2 things: get the site running again and analyze what was happening. But at this point the system is more or less frozen with SQL clipping at 99% CPU. OK, first I brought down the site to get the server back down to earth. Then I proceeded to remove all but the last 100000 records from the EventLog. I could possibly have deleted more, but this brought the DB to a reasonable size (something like 1 Gb). Backup the DB, zip this and bring down to my dev environment for analysis. Final change to the DB: truncate the EventLog table and shrink the DB again. Now to get the site working I got it back online and disabled the GENERAL_EXCEPTION in the EventLog (go to the settings for the eventlog and edit this row). Now the SQL is no longer working overtime. At this point we have to realize the error is still happening but the fact that it is no longer logged prevents SQL server going wild.
Analysis
For those of you who’ve not been here before, the Eventlog table has one major drawback: the actual details of the errors are in an NTEXT field as an XML blob. This prevents us from doing any SQL analysis on the data. For one I’d like to get an idea of which error is prevalent. This requires comparing the details between errors. We need to get to the contents of the XML. Given that it’s in NTEXT, SQL server will refuse to operate on it as it is stored as blobs and can’t be compared. Converting it to NVARCHAR(MAX) is impossible on a serious volume of data (I tried and the process just blocked). I also tried exporting to XML to open up in XML Spy for instance, but this lacks the analysis capabilities that I need. It was useful to run through the first few records to get a first idea, though. You can export to XML using a utility called “bcp”, or bulk copy. This is a tool that comes with SQL server, so you run it from a cmd window:
"C:\Program Files\Microsoft SQL Server\100\Tools\Binn\bcp" "select top 100 LogProperties from new_bring2mind_net.dbo.PortalCore_EventLog" queryout D:\Websites\local.bring2mind.net\eventlog.xml -U "test" -P "test" -c
Here “new_bring2mind_net” is the name of the database, “test” and “test” are username and password. Now all you need to do is to open the resulting file and stick a <foo> and </foo> around the contents to make it a valid XML file. Open in XML Spy (or similar tool) and you can walk through the log. But you’ll quickly notice that the actual details of the event (in the “message” property) are so large, they make it difficult to “see” what is happening in grid view.
So how to actually use the eventlog for analysis? Well, we need to get the XML data to SQL so we can run queries. And this SQL 2005+ can do. The approach I have now pursued is to (1) create a table for the log properties, (2) run a cursor over the EventLog and (3) read the LogProperties as XML and push that into the new table. So let’s look at some code. First we create the new table:
CREATE TABLE PortalCore_EventLogProperties (
LogEventID INT NOT NULL,
PropertyName VARCHAR(50) NOT NULL,
PropertyValue NVARCHAR(MAX)
)
GO
This sets up the table that will receive our data. Now for our bit of magic:
DECLARE @LogProperties XML
DECLARE @LogEventID INT
DECLARE c CURSOR FAST_FORWARD FOR
SELECT LogProperties, LogEventID FROM PortalCore_EventLog
OPEN c
FETCH NEXT FROM c
INTO @LogProperties, @LogEventID
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @LogEventID
INSERT INTO PortalCore_EventLogProperties(LogEventID, PropertyName, PropertyValue)
SELECT
@LogEventID,
Tbl.Col.value('(PropertyName)[1]', 'VARCHAR(50)'),
Tbl.Col.value('(PropertyValue)[1]', 'NVARCHAR(MAX)')
FROM
@LogProperties.nodes('//LogProperty') Tbl(Col)
WHERE NOT EXISTS (
SELECT * FROM PortalCore_EventLogProperties
WHERE LogEventID=@LogEventID
AND PropertyName=Tbl.Col.value('(PropertyName)[1]', 'VARCHAR(50)')
)
FETCH NEXT FROM c
INTO @LogProperties, @LogEventID
END
CLOSE c
DEALLOCATE c
GO
This migrates the logproperties to its own table. Note that this can take a while depending on the number of records (I ran 30K records in about 45 minutes while typing this). Optionally you can limit the amount by adding “TOP 1000” or another limiting instruction to the SELECT statement. But now we have the data where we want it. Ready for analysis. Let’s run a few queries.
SELECT * FROM
(SELECT DISTINCT(elp.PropertyValue),
(SELECT COUNT(*) FROM PortalCore_EventLogProperties elp1
WHERE elp1.PropertyName='RawURL' AND elp1.PropertyValue=elp.PropertyValue) AS cnt
FROM PortalCore_EventLogProperties elp
WHERE elp.PropertyName='RawURL') tmp
ORDER BY cnt DESC
This will show you a list of requested URLs with the number of occurrences in descending order. It may give you a hint where (i.e. which page) is giving you a headache. Now replace the value “RawURL” with “Message” and you’ll get the same but for the actual errors. The message is in effect the stacktrace. So this is the best way forward in determining where an error is occurring. You can of course also export this to a text file so you can work through it.
"C:\Program Files\Microsoft SQL Server\100\Tools\Binn\bcp" "SELECT cnt, PropertyValue FROM (SELECT DISTINCT(elp.PropertyValue), (SELECT COUNT(*) FROM new_bring2mind_net.dbo.PortalCore_EventLogProperties elp1 WHERE elp1.PropertyName='Message' AND elp1.PropertyValue=elp.PropertyValue) AS cnt FROM new_bring2mind_net.dbo.PortalCore_EventLogProperties elp WHERE elp.PropertyName='Message') tmp ORDER BY cnt DESC" queryout D:\Websites\local.bring2mind.net\StackTraces.txt -U "test" -P "test" –c
Good luck debugging!