Profiling .NET Applications: Part 1
Profiling is the process of collecting information about something so that you can come to a conclusion. You're probably aware of what is called racial profiling. The issue is that police or government officials gather information about a particular race to form or support an opinion that is held about that race. Many problems result from this approach, but one key to this analogy is that data collected can be used to support rather than form an opinion. Unlike racial profiling, software profiling is good and is encouraged. When profiling software, the developer collects data about the application (memory use, CPU usage, and so on) and forms an opinion as to why the application performs the way it does. Without the data, a developer is left to guess as to the causes of a particular problem. Probably worse yet is that the data might be incomplete and used to support a preconceived notion about the operation of the application.
After an application works, it is inevitable that it needs to be tuned. The application might require too much memory, be too slow, require too many resources, and so on. For all but the most trivial applications, programmers are notoriously wrong about the cause of program bottlenecks. Often, these wrong hunches cause the programmer to invest a great deal of effort in optimizing code that has little or no effect on the overall performance of the application. With the CLR taking over many of the tasks that traditionally have plagued programmers, it is even harder to determine the causes for a program's poor performance. With managed code, it is even more necessary to use proper tools to fine-tune an application.
This series of articles reviews some of the traditional tools that are available for profiling. These tools provide substantial information and should not be dismissed just because they are not .NET tools. You then explore performance counters, which provide an easy-to-use, extensible view into your .NET application. Finally, you learn about the interface that is provided so that you can build your own profile tool.
Traditional Tools Used in Profiling a .NET Application
The tools that existed pre-.NET still can be used to profile some of the characteristics of a .NET application. A .NET application still needs memory and it still needs the services of the CPU on which it is running. These tools provide information about the usage of generic resources such as these. Before you look at the specific tools (such as .NET performance counters and a custom profiler API) that are available for .NET applications, you need to review what is and has been available for any application.
Obtaining a Snapshot of Memory Usage with memsnap
This tool provides a snapshot of memory (and some resource) usage. It comes as part of Visual Studio .NET in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. If this utility is not in the directory specified, then you probably failed to install the Platform SDK (which is not installed by default). It is a command-line utility and it writes the snapshot to memsnap.log unless you specify a specific log file. Invoking memsnap with no arguments or -? gives this usage help output, as in Listing 1.
Listing 1memsnap Usage
memsnap [-t] [-g] [-?] [<logfile>] memsnap logs system memory usage to <logfile> <logfile> = memsnap.log by default -t Add tagging information (time (GMT), date, machinename) -g Add GDI and USER resource counts -? Gets this message memsnap Creates a memsnap.log file that contains a snap-shot of the memory usage. Each subsequent usage appends to the memsnap.log file.
If you have a managed application running like ThreadPoolTest.exe, the log file looks like Listing 2.
Listing 2memsnap Output
Process ID Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile Commit Handles Threads 00000000 (null) 16384 0 0 0 0 0 1 00000008 System 212992 0 0 24576 24576 114 38 . . . 0000B09C ThreadPoolTest. 10821632 33868 8332 4116480 4116480 119 4 0000B0A8 Memsnap.Exe 962560 13824 2016 434176 434176 20 1
If you start on instance of the DiningPhilosophers.exe application, look at memsnap:
Process ID Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile Commit Handles Threads 0000B09C ThreadPoolTest. 14708736 46736 12232 6963200 6963200 443 27
If you start three instances of the DiningPhilosophers.exe applications, you see the following:
Process ID Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile Commit Handles Threads 0000B09C ThreadPoolTest. 21106688 63760 15716 16670720 16670720 1391 38
Because 20 threads should exist per application, this just confirms the limit that the ThreadPool places on the number of threads that it allocates. If you stop all of the applications and return back to just the small ThreadPoolTest.exe form, you see the following:
Process ID Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile Commit Handles Threads 0000B09C ThreadPoolTest. 15237120 51916 9528 9191424 9191424 286 8
As you can see from the output, most of the threads have been returned to the pool, some of the memory has been reclaimed from the Pagefile, and the Working Set is somewhat smaller.
Obtaining Processor Usage Information with pstat
This function provides much of the same information as memsnap except that it also provides a dump of process-specific information including memory on standard output. This tool is located in the same place as memsnap (Microsoft Visual Studio.NET\Common7\Tools\Bin\WinNT). The output is substantial, so you might want to invoke pstat like this:
The first bit of information that this utility prints is basic system information, as shown in Listing 3.
Listing 3pstat Usage
Pstat version 0.3: memory: 261424 kb uptime: 0 1:10:09.603 PageFile: \??\D:\pagefile.sys Current Size: 393216 kb Total Used: 34464 kb Peak Used 35384 kb Memory: 261424K Avail: 48256K TotalWs: 245064K InRam Kernel: 3464K P:70456K Commit: 217564K/ 103000K Limit: 632712K Peak: 251296K Pool N:10816K P:89728K
This shows how much RAM is available, how long the computer has been up, basic information about the PageFile that you have for your system, and system-wide memory information. In addition, a section is available where process-specific information is printed.
User Time Kernel Time Ws Faults Commit Pri Hnd Thd Pid Name . . . 0:00:06.489 0:00:00.500 14860 9201 8948 8 282 6 45212 ThreadPoolTest.
This is the same information as memsnap, with the exception that here you can see how much time was spent in kernel and user mode. A more detailed output is available for each of the processes. For the ThreadPoolTest.exe process, the output looks like Listing 4.
Listing 4pstat Report of Thread Status for a Process
pid:b09c pri: 8 Hnd: 282 Pf: 9201 Ws: 14860K ThreadPoolTest. tid pri Ctx Swtch StrtAddr User Time Kernel Time State b098 10 1156 77E97CC6 0:00:00.871 0:00:00.050 Wait:UserRequest b0a0 8 1 77E87532 0:00:00.000 0:00:00.000 Wait:UserRequest b0a4 10 375 77E87532 0:00:00.110 0:00:00.050 Wait:UserRequest 79c 8 8 77E87532 0:00:00.000 0:00:00.000 Wait:LpcReceive 150 8 3118 77E87532 0:00:00.000 0:00:00.000 Wait:DelayExecution 20c 8 8 77E87532 0:00:00.010 0:00:00.000 Wait:UserRequest
Detailed information is available about each of the six threads that is running in the process. You can see a thread ID, a priority, the number of context switches, the start address of the thread, how much time the thread has spent in user mode and kernel mode, and what the current state of the thread is. If you were to start the DiningPhilosophers application, you would see detail on more threads. At the end of the output from pstat is a detailed table of the drivers and system services that are loaded and running on the system.
Profiling Memory Usage with vadump
Also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT, vadump reports memory usage information about a particular process. To use this tool, you first must obtain the process ID of the process. You might not know that process ID, so you will have to use pview or the Task Manager to get the hex ID of each process. You can activate the Task Manager from the Start Toolbar by right-clicking on the toolbar and selecting Task Manager. After the Task Manager comes up, you can select the Processes tab to view the processes and the associated IDs. pview is a program found in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. Start pview from a command line (assuming that you have the preceding path in your PATH environment variable). pview lists each process with its ID in parentheses. You then need to convert this hex version of the process ID to decimal. (You can use the calculator in Accessories to do this conversion.) Notice that Task Manager displays process IDs as decimal, so conversion is not necessary. The usage report for vadump is shown in Listing 5.
Listing 5vadump Usage
vadump -\? Usage: Dump the address space: vadump [-sv] -p decimal_process_id Dump the current workingset: vadump -o [-mpsv] [-l logfile] -p decimal_process_id Dump new additions to the workingset (Stop with ^C): vadump -w [-crv] [-l logfile] -p decimal_process_id -c Include code faults faulting PC summary -m Show all code symbols on page -o Workingset snapshot w/ summary -r Print info on individual faults -s Summary info only -t Include pagetable info in summary -w Track new working set additions -v Verbose
If you invoke vadump with all of the defaults, you get output that is split into three sections. The first section is an address map for the process. It looks like Listing 6.
Listing 6vadump Address Map
Address: 00000000 Size: 00010000 State Free Address: 00010000 Size: 00001000 State Committed Protect Read/Write Type Private Address: 00011000 Size: 0000F000 State Free Address: 00020000 Size: 00001000 State Committed Protect Read/Write Type Private Address: 00021000 Size: 0000F000 State Free Address: 00030000 Size: 000F4000 RegionSize: 100000 State Reserved Type Private Address: 00130000 Size: 00001000 State Committed Protect Read Only Type Mapped Address: 00131000 Size: 0000F000 State Free Address: 00140000 Size: 00061000 RegionSize: 100000 State Committed Protect Read/Write Type Private Address: 00240000 Size: 00006000 RegionSize: 10000 State Committed Protect Read/Write Type Private Address: 00250000 Size: 00001000 RegionSize: 10000 State Committed Protect Read/Write Type Mapped . . .
The next section describes how to obtain the memory that is associated with each DLL. Listing 7 shows a partial output of the vadump utility. Because you are using this tool to profile a .NET application, you can see how much memory each of the DLLs is responsible for. If the section under mscoree.dll includes some huge numbers, then you cannot fix this problem because the problem is with the CLR. Large memory usage in the main portion of the code (ThreadPoolTest) is something that you have a better chance of affecting.
Listing 7vadump DLL Memory Usage
Total Image Commitment 37752832 READONLY: 9662464 READWRITE: 1425408 WRITECOPY: 1576960 EXECUTEREAD: 25088000 Total ThreadPoolTest.exe Commitment 20480 READONLY: 12288 EXECUTEREAD: 8192 Total ntdll.dll Commitment 692224 READONLY: 196608 READWRITE: 16384 WRITECOPY: 4096 EXECUTEREAD: 475136 Total mscoree.dll Commitment 126976 READONLY: 16384 READWRITE: 8192 WRITECOPY: 4096 EXECUTEREAD: 98304
A summary report, as shown in Listing 8, offers important information about an application. Specifically, for this .NET application, you can see how much of a memory resource hog this application will be.
Listing 8vadump Summary
Dynamic Reserved Memory 49557504 PageFaults: 6720 PeakWorkingSetSize 14942208 WorkingSetSize 13570048 PeakPagedPoolUsage 57524 PagedPoolUsage 51696 PeakNonPagedPoolUsage 15984 NonPagedPoolUsage 13984 PagefileUsage 6762496 PeakPagefileUsage 8273920
If you add a flag to turn on verbose mode, you get an address map of where each of the DLLs associated with the process were loaded, as in Listing 9.
Listing 9vadump Verbose
vadump -v -p 3932 Symbols loaded: 00400000 : 00408000 ThreadPoolTest.exe Symbols loaded: 77f50000 : 77ff9000 ntdll.dll Symbols loaded: 60280000 : 6029f000 mscoree.dll Symbols loaded: 77dd0000 : 77e5a000 ADVAPI32.dll . . . Symbols loaded: 5f060000 : 5f246000 system.windows.forms.dll Symbols loaded: 5f250000 : 5f502000 system.windows.forms.dll Symbols loaded: 5e4b0000 : 5e61a000 system.dll Symbols loaded: 5e620000 : 5e7d4000 system.dll Symbols loaded: 61260000 : 612a8000 MSCORJIT.DLL Symbols loaded: 516f0000 : 51758000 diasymreader.dll Symbols loaded: 5ec00000 : 5ec70000 system.drawing.dll Symbols loaded: 5ec70000 : 5ed34000 system.drawing.dll Symbols loaded: 6b180000 : 6b321000 gdiplus.dll Symbols loaded: 5fa10000 : 5fb32000 system.xml.dll Symbols loaded: 77120000 : 771ab000 OLEAUT32.DLL . . . (regular output)
Listing 9 shows the output of vadump for the same ThreadPoolTest application. Notice that from the output of this tool, you can see not only what DLLs are loaded, but also where they are placed. You can immediately see that this is a managed (.NET) application. You can see DLLs such as mscoree.dll and system.windows.forms.dll loaded into the address space.
Like most other traditional tools, vadump mainly provides information about the process memory and how it is allocated and used.
Detailed Execution Profile of a .NET Application Using profile
profile is also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. From the output of profile, you can obtain a count of the number of functions called in each DLL that is loaded as part of the process. The output of profile is somewhat cryptic, but it can be useful for low-level analysis of a running process.
Invoking profile like this
leads to output like Listing 10.
Listing 10profile Output
78,ntdll.dll,Total 1,ntdll.dll,RtlMultiByteToUnicodeN (77f81ff0) 5,ntdll.dll,RtlEnterCriticalSection (77f821d6) 1,ntdll.dll,RtlConsoleMultiByteToUnicodeN (77f822c5) 1,ntdll.dll,allmul (77f827c0) . . . 6,ntdll.dll,RtlFreeHeap (77fcb633) 9,mscoree.dll,Total 2,mscoree.dll,GetHostConfigurationFile (602834be) 6,mscoree.dll,CoInitializeCor (6028390b) 1,mscoree.dll,StrongNameFreeBuffer (60286e6e) 38,KERNEL32.DLL,Total 3,KERNEL32.DLL,LocalAlloc (77e86568) 3,KERNEL32.DLL,LocalFree (77e865f3) 3,KERNEL32.DLL,InterlockedIncrement (77e86659) 1,KERNEL32.DLL,InterlockedDecrement (77e8666a) . . . 1,KERNEL32.DLL,IsProcessorFeaturePresent (77e8ddf1) 1,KERNEL32.DLL,SetEnvironmentVariableW (77e8dea0) 4,KERNEL32.DLL,GetEnvironmentVariableA (77e9fb75) 1,KERNEL32.DLL,FindVolumeMountPointClose (77ea0877) 2,RPCRT4.DLL,Total 1,RPCRT4.DLL,RpcBindingSetAuthInfoExW (77d5959c) 1,RPCRT4.DLL,I_RpcServerUseProtseqEp2W (77d5abe3)
As you can see, the output is difficult to understand. The first number is the number of times that the function in the specified DLL was called. The second field gives the name of the DLL. The third field is either the total number of calls associated with this DLL (in which case it is "Total"), or it is the name of the function within the DLL that was called. The last field is the address of the function in the DLL. The problem with this output is that it does not list managed method calls.
Monitoring Page Faults with pfmon
This tool is also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. Pfmon outputs information about the number of page faults that your process or application generates. One possible way to invoke this tool is as follows:
pfmon n traceswitch.exe
With the n switch, the tool outputs all of its information to a file pfmon.log. When this tool is used to monitor the page faults for a .NET application, traceswitch.exe, the log file looks like Listing 11.
Listing 11pfmon Output
SOFT: GlobalMemoryStatus : GlobalMemoryStatus SOFT: 602812e1 : 6029afb0 HARD: 602812e1 : 6029afb0 SOFT: ReleaseSemaphore+0x2b : ReleaseSemaphore+0x0000002B SOFT: SetFileTime+0x86 : SetFileTime+0x00000086 SOFT: VerifyVersionInfoW+0x3b : VerifyVersionInfoW+0x0000003B SOFT: StrongNameErrorInfo+0x10a8 : StrongNameErrorInfo+0x000010A7 SOFT: CorValidateImage : CorValidateImage SOFT: DllCanUnloadNow+0x319 : DllCanUnloadNow+0x00000318 HARD: CorValidateImage+0x6b : 004000a8 SOFT: RtlFindCharInUnicodeString+0x405 : 004032a0 HARD: RtlIsValidHandle+0xfe : 00402000 SOFT: RtlDosApplyFileIsolationRedirection_Ustr+0x1cb : RtlCaptureContext+0x0000FCA0 SOFT: GetDateFormatW+0x5ae : GetDateFormatW+0x000005AD SOFT: SetErrorMode+0x94 : SetErrorMode+0x00000093 SOFT: CorExeMain : CorExeMain SOFT: CreateConfigStream+0xc98 : CreateConfigStream+0x00000C98 SOFT: 60291262 : 60291262 SOFT: CreateActCtxW+0xa3a : CreateActCtxW+0x00000A3A SOFT: 60290f8a : 60290f8a SOFT: CreateConfigStream : CreateConfigStream SOFT: 602932b9 : 602932b8 SOFT: RegQueryInfoKeyW : AllocateLocallyUniqueId+0x00000229 SOFT: RegEnumValueW : CommandLineFromMsiDescriptor+0x0000057A SOFT: CommandLineFromMsiDescriptor+0x3f3 : CommandLineFromMsiDescriptor+0x000003F2 SOFT: RtlAddRefActivationContext+0x2c : 00410000 SOFT: wcsrchr+0xde : 60c5400c SOFT: RtlFindCharInUnicodeString+0x405 : 60c36424 SOFT: RtlFindCharInUnicodeString+0x456 : 60a61000 . . . PFMON: Total Faults 3241 (KM 366 UM 3241 Soft 2974, Hard 267, Code 960, Data 2281)
A page fault occurs when a thread addresses memory that is not in the main working-set. The page that contains the address is brought into the current working-set, and some pages might be removed from the current working set. The presence of many page faults indicates that the process is accessing a wide range of addresses that are probably not contiguous. If your process has many page faults, it is probably thrashing, a term indicating that a large portion of the time allocated to your application is spent paging memory to and from disk. When this occurs, performance decreases. A simple fix is to change your program to access memory in a more linear fashion.
Monitoring Process Timing, Memory, and Thread Count with pview
pview is located in Microsoft Visual Studio.NET\Common7\Tools\Bin\WinNT. pview presents a graphical display of processes on your system. It provides all of the information available from the previous command-line tools, but in a dialog-type user interface. When you invoke pview.exe, you get a dialog box that looks like Figure 1. Notice that this figure highlights information about a .NET application, ThreadPoolTest.exe.
Figure 1 pview process view.
You can highlight each of the threads listed for each process and get the start address of the thread, the current address location of the thread, the number of context switches, and the priority. If you highlight a process and select the Memory Detail button, you will get a dialog box that looks like Figure 2.
Figure 2 pview memory detail.
This dialog box contains much information, but notice that the DLLs that would have specific interest for a managed process are shown in the drop-down list. pview incorporates much of the same information that is obtained from vadump and pstat into a compact graphical form.
The Task Manager is always available by right-clicking on the toolbar. On Windows 2000 machines, three tabs exist: Applications, Processes, and Performance. With Windows XP, a fourth tab titled Networking has been added. If you need to monitor network activity, the Networking tab can be useful. It shows a graphical view of network utilization. If your application uses the network heavily, the Network tab offers a quick check to see if the bottleneck in your program is the network. Figure 3 shows what the Task Manager looks like.
Figure 3 Monitoring a .NET application with Task Manager.
The Processes tab shows information about each process running on your system. By default, the process identifier, CPU usage, CPU time, memory usage, and thread count are displayed. However, as can be seen in Figure 4, other types of information can also be displayed. Depending on the information you need to gather about your application, you might be interested in some of this information. For example, if you are concerned that thread pooling is not working, you can turn on the option to list the number of threads in a process and see if the thread count goes too high.
Figure 4 Task Manager process view options.
Monitoring System-Wide Information with perfmtr
perfmtr is located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. perfmtr provides a console view of the processes running on your system. By default, invoking perfmtr like this:
The output is similar to Figure 5, which monitors the CPU usage for the system.
Figure 5 Monitoring system-wide CPU usage with perfmtr.
Other system-wide status information is available by entering the appropriate letter as indicated when the tool is first invoked. For example, "C" causes permtr to do a continuous dump of CPU usage, "V" provides a dump of virtual memory, and "H" causes file cache information to be monitored. See Figure 5 for all of the options, or invoke the tool on your own. Permtr provides system information, and it would only be useful in profiling an application if this information was used as a baseline.
Profiling the Profiler with exctrlst
This last tool leads us into the next section. exctrlst shows the performance counters that have been installed. This tool shows the tools that are available through the Performance Monitor. When you run this tool, you get a dialog box that looks like Figure 6.
Figure 6 Performance counter information.
With this tool, you can see which DLL is associated with each of the performance categories. You can use this information along with some of the tools outlined earlier to determine whether a significant bottleneck exists due to the profile process. If the performance counter caused the application to crash, you could either debug it (if it was your performance counter) or send the information to those responsible for the performance counter to help them diagnose the problem.
Although Exctrlst does not indicate the exact counters that are available, it does indicate that the Performance Monitor has been extended and additional counters are available. It also shows the counter range that is used.