Now that I have a pretty good automated process for creating new builds the next task was to figure out which performance counters I should be looking at why. Vladan Strigo has compiled the following list of things to look at when analysing an application. This information is a distilled look at the Improving .Net Performance And Scalabilty book from the Patterns & Practices group.
Vladan has also pointed out another set of articles here on using the debugger to isolate issues in production code.
CPU
· Processor\% Processor Time
Threshold: The general figure for the threshold limit for processors is 85 percent.
Significance: This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating.
· Processor\% Privileged Time
Threshold: A figure that is consistently over 75 percent indicates a bottleneck.
Significance: This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode.
· System\Processor Queue Length
Threshold: An average value consistently higher than 2 indicates a bottleneck.
Significance: If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases.
You can use this counter in conjunction with the Processor\% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload.
If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved.
· System\Context Switches/sec
Threshold: As a general rule, context switching rates of less than 5,000 per second per processor are not worth worrying about. If context switching rates exceed 15,000 per second per processor, then there is a constraint.
Significance: Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked.
My CPU problems were shown mostly through 2 counters, my processor was peaking at 100% for the host’s w3wp process (you can find out which process is related to which application pool by running iisapp command in the command line), and in that period I had high values with the processor queue length. Later when I get to the Managed code counters I will point out a couple of others which were also important
Memory
· Memory\Available Mbytes
Threshold: A consistent value of less than 20 to 25 percent of installed RAM is an indication of insufficient memory.
Significance: This indicates the amount of physical memory available to processes running on the computer. Note that this counter displays the last observed value only. It is not an average.
· Memory\Pages/sec
Threshold: Sustained values higher than five indicate a bottleneck.
Significance: This counter indicates the rate at which pages are read from or written to disk to resolve hard page faults. Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, which indicates that you need more RAM.
My counters here did not show any unusual signs, so I concluded that I didn’t have memory problems
Managed code
Memory
· Process\Private Bytes
Threshold: The threshold depends on your application and on settings in the Machine config file. The default for ASP.NET is 60 percent available physical RAM or 800 MB, whichever is the minimum. Note that .NET Framework 1.1 supports 1,800 MB as the upper bound instead of 800 MB if you add a /3GB switch in your Boot.ini file. This is because the .NET Framework is able to support 3 GB virtual address space instead of the 2 GB for the earlier versions.
Significance: This counter indicates the current number of bytes allocated to this process that cannot be shared with other processes. This counter is used for identifying memory leaks.
· .NET CLR Memory\% Time in GC
Threshold: This counter should average about 5 percent for most applications when the CPU is 70 percent busy, with occasional peaks. As the CPU load increases, so does the percentage of time spent performing garbage collection. Keep this in mind when you measure the CPU.
Significance: This counter indicates the percentage of elapsed time spent performing a garbage collection since the last garbage collection cycle. The most common cause of a high value is making too many allocations, which may be the case if you are allocating on a per-request basis for ASP.NET applications. You need to study the allocation profile for your application if this counter shows a higher value.
· .NET CLR Memory\# Bytes in all Heaps
Threshold: No specific value.
Significance: This counter is the sum of four other counters — Gen 0 Heap Size, Gen 1 Heap Size, Gen 2 Heap Size, and Large Object Heap Size. The value of this counter will always be less than the value of Process\Private Bytes, which also includes the native memory allocated for the process by the operating system. Private Bytes - # Bytes in all Heaps is the number of bytes allocated for unmanaged objects.
This counter reflects the memory usage by managed resources.
· .NET CLR Memory\Large Object Heap Size
Threshold: No specific values.
Significance: The large object heap size shows the amount of memory consumed by objects whose size is greater than 85 KB. If the difference between # Bytes in All Heaps and Large Object Heap Size is small, most of the memory is being used up by large objects. The large object heap cannot be compacted after collection and may become heavily fragmented over a period of time. You should investigate your memory allocation profile if you see large numbers here.
Although I didn’t have memory problems, I’ve monitored these to see where the spent memory is allocated (and in which amount) when my processor makes trouble, none of these had shown anything too significant (changes were detected on the heaps, but nothing ground-braking J)
Exceptions
· .NET CLR Exceptions\# of Exceps Thrown / sec
Threshold: This counter value should be less than 5 percent of Request/sec for the ASP.NET application. If you see more than 1 request in 20 throw an exception, you should pay closer attention to it.
Significance: This counter indicates the total number of exceptions generated per second in managed code. Exceptions are very costly and can severely degrade your application performance. You should investigate your code for application logic that uses exceptions for normal processing behavior. Response.Redirect, Server.Transfer, and Response.End all cause a ThreadAbortException in ASP.NET applications.
I’ve monitored exceptions to see if they could be the cause of my problems (they weren’t)
Contention
To measure contention, use the following counters:
· .NET CLR LocksAndThreads\Contention Rate / sec
Threshold: No specific value.
Significance: This counter displays the rate at which the runtime attempts to acquire a managed lock but without a success. Sustained nonzero values may be a cause of concern. You may want to run dedicated tests for a particular piece of code to identify the contention rate for the particular code path.
Threading
To measure threading, use the following counters:
· .NET CLR LocksAndThreads\# of current physical Threads
Threshold: No specific value.
Significance: This counter indicates the number of native operating system threads currently owned by the CLR that act as underlying threads for .NET thread objects. This gives you the idea of how many threads are actually spawned by your application.
This counter can be monitored along with System\Context Switches/sec. A high rate of context switches almost certainly means that you have spawned a higher than optimal number of threads for your process. If you want to analyze which threads are causing the context switches, you can analyze the Thread\Context Swtiches/sec counter for all threads in a process and then make a dump of the process stack to identify the actual threads by comparing the thread IDs from the test data with the information available from the dump.
· Thread\% Processor Time
Threshold: No specific value.
Significance: This counter gives you the idea as to which thread is actually taking the maximum processor time. If you see idle CPU and low throughput, threads could be waiting or deadlocked. You can take a stack dump of the process and compare the thread IDs from test data with the dump information to identify threads that are waiting or blocked.
· Thread\Context Switches/sec
Threshold: No specific value.
Significance: The counter needs to be investigated when the System\Context Switches/sec counter shows a high value. The counter helps in identifying which threads are actually causing the high context switching rates.
· Thread\Thread State
Threshold: The counter tells the state of a particular thread at a given instance.
Significance: You need to monitor this counter when you fear that a particular thread is consuming most of the processor resources.
So the next step is to now setup the performance sessions, web tests and load tests in VSTS to monitor these counters and see if we can isolate some code that could be optimised.
Cheers