Home > Articles > Programming > Windows Programming

Profiling .NET Applications: Part 1

  • Print
  • + Share This
Profiling is the process of collecting information about something so that you can come to a conclusion. In this series of three articles, Kevin Burton discusses profiling and its tools.
This first of three articles is excerpted from .NET Common Language Runtime Unleashed (Sams, 2002, ISBN: 0672321246).
This chapter is from the book

Profiling is the process of collecting information about something so that you can come to a conclusion. You're probably aware of what is called racial profiling. The issue is that police or government officials gather information about a particular race to form or support an opinion that is held about that race. Many problems result from this approach, but one key to this analogy is that data collected can be used to support rather than form an opinion. Unlike racial profiling, software profiling is good and is encouraged. When profiling software, the developer collects data about the application (memory use, CPU usage, and so on) and forms an opinion as to why the application performs the way it does. Without the data, a developer is left to guess as to the causes of a particular problem. Probably worse yet is that the data might be incomplete and used to support a preconceived notion about the operation of the application.

After an application works, it is inevitable that it needs to be tuned. The application might require too much memory, be too slow, require too many resources, and so on. For all but the most trivial applications, programmers are notoriously wrong about the cause of program bottlenecks. Often, these wrong hunches cause the programmer to invest a great deal of effort in optimizing code that has little or no effect on the overall performance of the application. With the CLR taking over many of the tasks that traditionally have plagued programmers, it is even harder to determine the causes for a program's poor performance. With managed code, it is even more necessary to use proper tools to fine-tune an application.

This series of articles reviews some of the traditional tools that are available for profiling. These tools provide substantial information and should not be dismissed just because they are not .NET tools. You then explore performance counters, which provide an easy-to-use, extensible view into your .NET application. Finally, you learn about the interface that is provided so that you can build your own profile tool.

Traditional Tools Used in Profiling a .NET Application

The tools that existed pre-.NET still can be used to profile some of the characteristics of a .NET application. A .NET application still needs memory and it still needs the services of the CPU on which it is running. These tools provide information about the usage of generic resources such as these. Before you look at the specific tools (such as .NET performance counters and a custom profiler API) that are available for .NET applications, you need to review what is and has been available for any application.

Obtaining a Snapshot of Memory Usage with memsnap

This tool provides a snapshot of memory (and some resource) usage. It comes as part of Visual Studio .NET in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. If this utility is not in the directory specified, then you probably failed to install the Platform SDK (which is not installed by default). It is a command-line utility and it writes the snapshot to memsnap.log unless you specify a specific log file. Invoking memsnap with no arguments or -? gives this usage help output, as in Listing 1.

Listing 1—memsnap Usage

memsnap [-t] [-g] [-?] [<logfile>]
memsnap logs system memory usage to <logfile>
<logfile> = memsnap.log by default
-t  Add tagging information (time (GMT), date, machinename)
-g  Add GDI and USER resource counts
-?  Gets this message
memsnap – Creates a memsnap.log file that contains a snap-shot of the memory
usage. Each subsequent usage appends to the memsnap.log file.

If you have a managed application running like ThreadPoolTest.exe, the log file looks like Listing 2.

Listing 2—memsnap Output

Process ID     Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile
     Commit  Handles  Threads
00000000       (null)   16384     0     0     0
        0     0     1
00000008       System  212992     0     0   24576
      24576    114    38
. . .
0000B09C   ThreadPoolTest. 10821632   33868   8332  4116480
     4116480    119     4
0000B0A8     Memsnap.Exe  962560   13824   2016  434176
     434176    20     1

If you start on instance of the DiningPhilosophers.exe application, look at memsnap:

Process ID     Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile
     Commit  Handles  Threads
0000B09C   ThreadPoolTest. 14708736   46736   12232  6963200
     6963200    443    27

If you start three instances of the DiningPhilosophers.exe applications, you see the following:

Process ID     Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile
     Commit  Handles  Threads
0000B09C   ThreadPoolTest. 21106688   63760   15716 16670720
    16670720   1391    38

Because 20 threads should exist per application, this just confirms the limit that the ThreadPool places on the number of threads that it allocates. If you stop all of the applications and return back to just the small ThreadPoolTest.exe form, you see the following:

Process ID     Proc.Name Wrkng.Set PagedPool NonPgdPl Pagefile
     Commit  Handles  Threads
0000B09C   ThreadPoolTest. 15237120   51916   9528  9191424
     9191424    286     8

As you can see from the output, most of the threads have been returned to the pool, some of the memory has been reclaimed from the Pagefile, and the Working Set is somewhat smaller.

Obtaining Processor Usage Information with pstat

This function provides much of the same information as memsnap except that it also provides a dump of process-specific information including memory on standard output. This tool is located in the same place as memsnap (Microsoft Visual Studio.NET\Common7\Tools\Bin\WinNT). The output is substantial, so you might want to invoke pstat like this:

pstat |more

or

pstat >pstat.log

The first bit of information that this utility prints is basic system information, as shown in Listing 3.

Listing 3—pstat Usage

Pstat version 0.3: memory: 261424 kb uptime: 0 1:10:09.603 

PageFile: \??\D:\pagefile.sys
  Current Size: 393216 kb Total Used: 34464 kb  Peak Used 35384 kb

 Memory: 261424K Avail: 48256K TotalWs: 245064K InRam Kernel: 3464K P:70456K
 Commit: 217564K/ 103000K Limit: 632712K Peak: 251296K Pool N:10816K P:89728K

This shows how much RAM is available, how long the computer has been up, basic information about the PageFile that you have for your system, and system-wide memory information. In addition, a section is available where process-specific information is printed.

  User Time  Kernel Time  Ws  Faults Commit Pri Hnd Thd Pid Name
. . .
 0:00:06.489  0:00:00.500 14860   9201  8948 8 282  6 45212
    ThreadPoolTest.

This is the same information as memsnap, with the exception that here you can see how much time was spent in kernel and user mode. A more detailed output is available for each of the processes. For the ThreadPoolTest.exe process, the output looks like Listing 4.

Listing 4—pstat Report of Thread Status for a Process

pid:b09c pri: 8 Hnd: 282 Pf:  9201 Ws: 14860K ThreadPoolTest.
 tid pri Ctx Swtch StrtAddr  User Time Kernel Time State
 b098 10   1156 77E97CC6 0:00:00.871 0:00:00.050 Wait:UserRequest
 b0a0  8     1 77E87532 0:00:00.000 0:00:00.000 Wait:UserRequest
 b0a4 10    375 77E87532 0:00:00.110 0:00:00.050 Wait:UserRequest
 79c  8     8 77E87532 0:00:00.000 0:00:00.000 Wait:LpcReceive
 150  8   3118 77E87532 0:00:00.000 0:00:00.000 Wait:DelayExecution
 20c  8     8 77E87532 0:00:00.010 0:00:00.000 Wait:UserRequest

Detailed information is available about each of the six threads that is running in the process. You can see a thread ID, a priority, the number of context switches, the start address of the thread, how much time the thread has spent in user mode and kernel mode, and what the current state of the thread is. If you were to start the DiningPhilosophers application, you would see detail on more threads. At the end of the output from pstat is a detailed table of the drivers and system services that are loaded and running on the system.

Profiling Memory Usage with vadump

Also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT, vadump reports memory usage information about a particular process. To use this tool, you first must obtain the process ID of the process. You might not know that process ID, so you will have to use pview or the Task Manager to get the hex ID of each process. You can activate the Task Manager from the Start Toolbar by right-clicking on the toolbar and selecting Task Manager. After the Task Manager comes up, you can select the Processes tab to view the processes and the associated IDs. pview is a program found in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. Start pview from a command line (assuming that you have the preceding path in your PATH environment variable). pview lists each process with its ID in parentheses. You then need to convert this hex version of the process ID to decimal. (You can use the calculator in Accessories to do this conversion.) Notice that Task Manager displays process IDs as decimal, so conversion is not necessary. The usage report for vadump is shown in Listing 5.

Listing 5—vadump Usage

vadump -\?
Usage:
 Dump the address space:
  vadump [-sv] -p decimal_process_id

 Dump the current workingset:
  vadump -o [-mpsv] [-l logfile] -p decimal_process_id

 Dump new additions to the workingset (Stop with ^C):
  vadump -w [-crv] [-l logfile] -p decimal_process_id

    -c Include code faults faulting PC summary
    -m Show all code symbols on page
    -o Workingset snapshot w/ summary
    -r Print info on individual faults
    -s Summary info only
    -t Include pagetable info in summary
    -w Track new working set additions
    -v Verbose

If you invoke vadump with all of the defaults, you get output that is split into three sections. The first section is an address map for the process. It looks like Listing 6.

Listing 6—vadump Address Map

Address: 00000000 Size: 00010000
  State Free

Address: 00010000 Size: 00001000
  State Committed
  Protect Read/Write
  Type Private

Address: 00011000 Size: 0000F000
  State Free

Address: 00020000 Size: 00001000
  State Committed
  Protect Read/Write
  Type Private

Address: 00021000 Size: 0000F000
  State Free

Address: 00030000 Size: 000F4000 RegionSize: 100000
  State Reserved
  Type Private

Address: 00130000 Size: 00001000
  State Committed
  Protect Read Only
  Type Mapped

Address: 00131000 Size: 0000F000
  State Free

Address: 00140000 Size: 00061000 RegionSize: 100000
  State Committed
  Protect Read/Write
  Type Private

Address: 00240000 Size: 00006000 RegionSize: 10000
  State Committed
  Protect Read/Write
  Type Private

Address: 00250000 Size: 00001000 RegionSize: 10000
  State Committed
  Protect Read/Write
  Type Mapped
. . .

The next section describes how to obtain the memory that is associated with each DLL. Listing 7 shows a partial output of the vadump utility. Because you are using this tool to profile a .NET application, you can see how much memory each of the DLLs is responsible for. If the section under mscoree.dll includes some huge numbers, then you cannot fix this problem because the problem is with the CLR. Large memory usage in the main portion of the code (ThreadPoolTest) is something that you have a better chance of affecting.

Listing 7—vadump DLL Memory Usage

Total Image Commitment 37752832
  READONLY:      9662464
  READWRITE:      1425408
  WRITECOPY:      1576960
  EXECUTEREAD:    25088000

Total ThreadPoolTest.exe Commitment  20480
  READONLY:       12288
  EXECUTEREAD:      8192

Total ntdll.dll Commitment  692224
  READONLY:       196608
  READWRITE:       16384
  WRITECOPY:       4096
  EXECUTEREAD:     475136

Total mscoree.dll Commitment  126976
  READONLY:       16384
  READWRITE:       8192
  WRITECOPY:       4096
  EXECUTEREAD:      98304

A summary report, as shown in Listing 8, offers important information about an application. Specifically, for this .NET application, you can see how much of a memory resource hog this application will be.

Listing 8—vadump Summary

Dynamic Reserved Memory 49557504

PageFaults:         6720
PeakWorkingSetSize   14942208
WorkingSetSize     13570048
PeakPagedPoolUsage     57524
PagedPoolUsage       51696
PeakNonPagedPoolUsage   15984
NonPagedPoolUsage     13984
PagefileUsage      6762496
PeakPagefileUsage    8273920

If you add a flag to turn on verbose mode, you get an address map of where each of the DLLs associated with the process were loaded, as in Listing 9.

Listing 9—vadump Verbose

vadump -v -p 3932
Symbols loaded: 00400000 : 00408000 ThreadPoolTest.exe
Symbols loaded: 77f50000 : 77ff9000 ntdll.dll
Symbols loaded: 60280000 : 6029f000 mscoree.dll
Symbols loaded: 77dd0000 : 77e5a000 ADVAPI32.dll
. . .
Symbols loaded: 5f060000 : 5f246000 system.windows.forms.dll
Symbols loaded: 5f250000 : 5f502000 system.windows.forms.dll
Symbols loaded: 5e4b0000 : 5e61a000 system.dll
Symbols loaded: 5e620000 : 5e7d4000 system.dll
Symbols loaded: 61260000 : 612a8000 MSCORJIT.DLL
Symbols loaded: 516f0000 : 51758000 diasymreader.dll
Symbols loaded: 5ec00000 : 5ec70000 system.drawing.dll
Symbols loaded: 5ec70000 : 5ed34000 system.drawing.dll
Symbols loaded: 6b180000 : 6b321000 gdiplus.dll
Symbols loaded: 5fa10000 : 5fb32000 system.xml.dll
Symbols loaded: 77120000 : 771ab000 OLEAUT32.DLL
. . .
(regular output)

Listing 9 shows the output of vadump for the same ThreadPoolTest application. Notice that from the output of this tool, you can see not only what DLLs are loaded, but also where they are placed. You can immediately see that this is a managed (.NET) application. You can see DLLs such as mscoree.dll and system.windows.forms.dll loaded into the address space.

Like most other traditional tools, vadump mainly provides information about the process memory and how it is allocated and used.

Detailed Execution Profile of a .NET Application Using profile

profile is also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. From the output of profile, you can obtain a count of the number of functions called in each DLL that is loaded as part of the process. The output of profile is somewhat cryptic, but it can be useful for low-level analysis of a running process.

Invoking profile like this

profile DiningPhilosophers.exe

leads to output like Listing 10.

Listing 10—profile Output

78,ntdll.dll,Total
1,ntdll.dll,RtlMultiByteToUnicodeN (77f81ff0)
5,ntdll.dll,RtlEnterCriticalSection (77f821d6)
1,ntdll.dll,RtlConsoleMultiByteToUnicodeN (77f822c5)
1,ntdll.dll,allmul (77f827c0)
. . .
6,ntdll.dll,RtlFreeHeap (77fcb633)
9,mscoree.dll,Total
2,mscoree.dll,GetHostConfigurationFile (602834be)
6,mscoree.dll,CoInitializeCor (6028390b)
1,mscoree.dll,StrongNameFreeBuffer (60286e6e)
38,KERNEL32.DLL,Total
3,KERNEL32.DLL,LocalAlloc (77e86568)
3,KERNEL32.DLL,LocalFree (77e865f3)
3,KERNEL32.DLL,InterlockedIncrement (77e86659)
1,KERNEL32.DLL,InterlockedDecrement (77e8666a)
. . .
1,KERNEL32.DLL,IsProcessorFeaturePresent (77e8ddf1)
1,KERNEL32.DLL,SetEnvironmentVariableW (77e8dea0)
4,KERNEL32.DLL,GetEnvironmentVariableA (77e9fb75)
1,KERNEL32.DLL,FindVolumeMountPointClose (77ea0877)
2,RPCRT4.DLL,Total
1,RPCRT4.DLL,RpcBindingSetAuthInfoExW (77d5959c)
1,RPCRT4.DLL,I_RpcServerUseProtseqEp2W (77d5abe3)

As you can see, the output is difficult to understand. The first number is the number of times that the function in the specified DLL was called. The second field gives the name of the DLL. The third field is either the total number of calls associated with this DLL (in which case it is "Total"), or it is the name of the function within the DLL that was called. The last field is the address of the function in the DLL. The problem with this output is that it does not list managed method calls.

Monitoring Page Faults with pfmon

This tool is also located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. Pfmon outputs information about the number of page faults that your process or application generates. One possible way to invoke this tool is as follows:

pfmon –n traceswitch.exe

With the –n switch, the tool outputs all of its information to a file pfmon.log. When this tool is used to monitor the page faults for a .NET application, traceswitch.exe, the log file looks like Listing 11.

Listing 11—pfmon Output

SOFT: GlobalMemoryStatus : GlobalMemoryStatus
SOFT: 602812e1 : 6029afb0
HARD: 602812e1 : 6029afb0
SOFT: ReleaseSemaphore+0x2b : ReleaseSemaphore+0x0000002B
SOFT: SetFileTime+0x86 : SetFileTime+0x00000086
SOFT: VerifyVersionInfoW+0x3b : VerifyVersionInfoW+0x0000003B
SOFT: StrongNameErrorInfo+0x10a8 : StrongNameErrorInfo+0x000010A7
SOFT: CorValidateImage : CorValidateImage
SOFT: DllCanUnloadNow+0x319 : DllCanUnloadNow+0x00000318
HARD: CorValidateImage+0x6b : 004000a8
SOFT: RtlFindCharInUnicodeString+0x405 : 004032a0
HARD: RtlIsValidHandle+0xfe : 00402000
SOFT: RtlDosApplyFileIsolationRedirection_Ustr+0x1cb :
    RtlCaptureContext+0x0000FCA0
SOFT: GetDateFormatW+0x5ae : GetDateFormatW+0x000005AD
SOFT: SetErrorMode+0x94 : SetErrorMode+0x00000093
SOFT: CorExeMain : CorExeMain
SOFT: CreateConfigStream+0xc98 : CreateConfigStream+0x00000C98
SOFT: 60291262 : 60291262
SOFT: CreateActCtxW+0xa3a : CreateActCtxW+0x00000A3A
SOFT: 60290f8a : 60290f8a
SOFT: CreateConfigStream : CreateConfigStream
SOFT: 602932b9 : 602932b8
SOFT: RegQueryInfoKeyW : AllocateLocallyUniqueId+0x00000229
SOFT: RegEnumValueW : CommandLineFromMsiDescriptor+0x0000057A
SOFT: CommandLineFromMsiDescriptor+0x3f3 :
   CommandLineFromMsiDescriptor+0x000003F2
SOFT: RtlAddRefActivationContext+0x2c : 00410000
SOFT: wcsrchr+0xde : 60c5400c
SOFT: RtlFindCharInUnicodeString+0x405 : 60c36424
SOFT: RtlFindCharInUnicodeString+0x456 : 60a61000
. . .
PFMON: Total Faults 3241 (KM 366 UM 3241 Soft 2974, Hard 267, Code 960,
    Data 2281)

A page fault occurs when a thread addresses memory that is not in the main working-set. The page that contains the address is brought into the current working-set, and some pages might be removed from the current working set. The presence of many page faults indicates that the process is accessing a wide range of addresses that are probably not contiguous. If your process has many page faults, it is probably thrashing, a term indicating that a large portion of the time allocated to your application is spent paging memory to and from disk. When this occurs, performance decreases. A simple fix is to change your program to access memory in a more linear fashion.

Monitoring Process Timing, Memory, and Thread Count with pview

pview is located in Microsoft Visual Studio.NET\Common7\Tools\Bin\WinNT. pview presents a graphical display of processes on your system. It provides all of the information available from the previous command-line tools, but in a dialog-type user interface. When you invoke pview.exe, you get a dialog box that looks like Figure 1. Notice that this figure highlights information about a .NET application, ThreadPoolTest.exe.

Figure 1 pview process view.

You can highlight each of the threads listed for each process and get the start address of the thread, the current address location of the thread, the number of context switches, and the priority. If you highlight a process and select the Memory Detail button, you will get a dialog box that looks like Figure 2.

Figure 2 pview memory detail.

This dialog box contains much information, but notice that the DLLs that would have specific interest for a managed process are shown in the drop-down list. pview incorporates much of the same information that is obtained from vadump and pstat into a compact graphical form.

Task Manager

The Task Manager is always available by right-clicking on the toolbar. On Windows 2000 machines, three tabs exist: Applications, Processes, and Performance. With Windows XP, a fourth tab titled Networking has been added. If you need to monitor network activity, the Networking tab can be useful. It shows a graphical view of network utilization. If your application uses the network heavily, the Network tab offers a quick check to see if the bottleneck in your program is the network. Figure 3 shows what the Task Manager looks like.

Figure 3 Monitoring a .NET application with Task Manager.

The Processes tab shows information about each process running on your system. By default, the process identifier, CPU usage, CPU time, memory usage, and thread count are displayed. However, as can be seen in Figure 4, other types of information can also be displayed. Depending on the information you need to gather about your application, you might be interested in some of this information. For example, if you are concerned that thread pooling is not working, you can turn on the option to list the number of threads in a process and see if the thread count goes too high.

Figure 4 Task Manager process view options.

Monitoring System-Wide Information with perfmtr

perfmtr is located in Microsoft Visual Studio .NET\Common7\Tools\Bin\WinNT. perfmtr provides a console view of the processes running on your system. By default, invoking perfmtr like this:

perfmtr

The output is similar to Figure 5, which monitors the CPU usage for the system.

Figure 5 Monitoring system-wide CPU usage with perfmtr.

Other system-wide status information is available by entering the appropriate letter as indicated when the tool is first invoked. For example, "C" causes permtr to do a continuous dump of CPU usage, "V" provides a dump of virtual memory, and "H" causes file cache information to be monitored. See Figure 5 for all of the options, or invoke the tool on your own. Permtr provides system information, and it would only be useful in profiling an application if this information was used as a baseline.

Profiling the Profiler with exctrlst

This last tool leads us into the next section. exctrlst shows the performance counters that have been installed. This tool shows the tools that are available through the Performance Monitor. When you run this tool, you get a dialog box that looks like Figure 6.

Figure 6 Performance counter information.

With this tool, you can see which DLL is associated with each of the performance categories. You can use this information along with some of the tools outlined earlier to determine whether a significant bottleneck exists due to the profile process. If the performance counter caused the application to crash, you could either debug it (if it was your performance counter) or send the information to those responsible for the performance counter to help them diagnose the problem.

Although Exctrlst does not indicate the exact counters that are available, it does indicate that the Performance Monitor has been extended and additional counters are available. It also shows the counter range that is used.

  • + Share This
  • 🔖 Save To Your Account