Profiling .NET Applications: Part 3
Using Custom Profiling APIs
Like the debugging APIs that the .NET Framework supports, an interface is available that can be used to profile .NET applications. The primary documentation for the profiling interface is in the Tools Developers Guide in a file called Profiling.doc. The API consists of two COM interfaces: ICorProfilerCallback and ICorProfilerInfo. To build a profiler, the programmer must implement the methods that are part of ICorProfilerCallback. The profiler is implemented as an in-process COM DLL. This DLL implements the ICorProfilerCallback interface. ICorProfilerInfo is an interface that is implemented in the CLR and queried for during the initialization of the profiler.
The SDK ships with two sample profilers that illustrate what is involved in building a profiler for managed code. Each of the samples is included in the profiler directory because the projects have been modified so that you can use Visual Studio.NET to compile and run the profilers. A General Code Profiler can be found in the gcp_profiler directory, and a Hot Spots Tracker can be found in the hst_profiler directory. These projects have been copied into gcp and hst directories respectively, and they have been modified so that they can be compiled with Visual Studio.NET.
Each of the samples follows the same setup process. After the code has been compiled and the resulting DLL properly linked, all you need to do is run a simple batch file called EnableProfiler.bat from a Command Prompt window. This batch file registers the COM DLL and sets two environment variables:
Cor_Enable_ProfilingThis environment variable is set to a non-zero value to turn on profiling.
Cor_ProfilerThis environment variable is set to the CLSID or ProdID of the COM component that implements the ICorProfilerCallback interface.
Running any managed process under this environment will result in the CLR doing the equivalent of CoCreateInstance to instantiate the CoClass that implements the ICorProfilerCallback interface. After an instance of the interface has been created, the CLR calls the Initialize method of this interface. The implementation of the Initialize method is responsible for setting flags to indicate the events to receive. These flags and the implementation of the various methods in the ICorProfilerCallback are what differentiate one profiler from another. The Initialize method is also where the profiler should query for ICorProfilerInfo so that additional information can be obtained from the CLR about the profiling process. When the process is finished, the CLR will call the Shutdown method of ICorProfilerCallback.
General Code Profiler
The General Code Profiler (GCP) sets the flags for the events for which it is to listen for, as shown in Listing 1.
Listing 1General Code Profiler Event Flags
m_dwEventMask = (DWORD) ( COR_PRF_MONITOR_CLASS_LOADS | COR_PRF_MONITOR_MODULE_LOADS | COR_PRF_MONITOR_ASSEMBLY_LOADS | COR_PRF_MONITOR_APPDOMAIN_LOADS | COR_PRF_MONITOR_JIT_COMPILATION | COR_PRF_MONITOR_EXCEPTIONS | COR_PRF_MONITOR_THREADS | COR_PRF_MONITOR_CODE_TRANSITIONS | COR_PRF_MONITOR_ENTERLEAVE | COR_PRF_MONITOR_CACHE_SEARCHES | COR_PRF_DISABLE_INLINING );
Specifically, notice that the following are not defined:
COR_PRF_MONITOR_FUNCTION_UNLOADS
COR_PRF_MONITOR_GC
COR_PRF_MONITOR_OBJECT_ALLOCATED
COR_PRF_MONITOR_REMOTING
COR_PRF_MONITOR_CCW
COR_PRF_MONITOR_REMOTING_COOKIE
COR_PRF_MONITOR_REMOTING_ASYNC
COR_PRF_MONITOR_SUSPENDS
COR_PRF_MONITOR_CLR_EXCEPTIONS
COR_PRF_ENABLE_REJIT
COR_PRF_ENABLE_INPROC_DEBUGGING
COR_PRF_ENABLE_JIT_MAPS
COR_PRF_DISABLE_OPTIMIZATIONS
COR_PRF_ENABLE_OBJECT_ALLOCATED
As you can see from the previous list of options that the GCP does not use, many profiling options can be associated with a profiling application.
This sample profiler records the information as the events are called. Most functions follow the same format. Look at Listing 2, which illustrates the implementation of the method AppDomainCreationFinished (defined in ProfilerGCP.h, which is part of the source for this profiler).
Listing 2GCP AppDomainCreationFinished Callback Implementation
HRESULT ProfilerCallback::AppDomainCreationFinished(AppDomainID appDomainID, HRESULT hrStatus) { _ASSERT_( SUCCEEDED( hrStatus ) ); try { AddAppDomain( appDomainID ); } catch ( BaseException *exception ) { exception->ReportFailure(); delete exception; Failure(); } return S_OK; } // ProfilerCallback::AppDomainCreationFinished
The way that each of the callbacks records an event is to enter it into a table. The GCP has a series of tables in ProfilerHelper.h. The tables look like Listing 3.
Listing 3GCP Tables
// tables SList<ClassInfo *, ClassID> *m_pClassTable; SList<ThreadInfo *, ThreadID> *m_pThreadTable; SList<ModuleInfo *, ModuleID> *m_pModuleTable; SList<FunctionInfo *, FunctionID> *m_pFunctionTable; SList<AssemblyInfo *, AssemblyID> *m_pAssemblyTable; SList<AppDomainInfo *, AppDomainID> *m_pAppDomainTable;
When the profile session is complete and the Shutdown method has been called, GCP dumps the contents of the tables in a routine DumpTables(), as shown in Listing 4.
Listing 4DumpTables in GCP
void PrfInfo::DumpTables() { // dump thread table if ( m_pThreadTable != NULL ) { wprintf( L"*** THREAD TABLE ***\n" ); m_pThreadTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } // dump AppDomain table if ( m_pAppDomainTable != NULL ) { wprintf( L"*** APP_DOMAIN TABLE ***" ); m_pAppDomainTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } // dump Assembly table if ( m_pAssemblyTable != NULL ) { wprintf( L"*** ASSEMBLY TABLE ***\n" ); m_pAssemblyTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } // dump module table if ( m_pModuleTable != NULL ) { wprintf( L"*** MODULE TABLE ***\n" ); m_pModuleTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } // dump class table if ( m_pClassTable != NULL ) { wprintf( L"*** CLASS TABLE ***\n" ); m_pClassTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } // dump function table if ( m_pFunctionTable != NULL ) { wprintf( L"*** FUNCTION TABLE ***\n" ); m_pFunctionTable->Dump(); wprintf( L"\n" ); wprintf( L"\n" ); } } // PrfInfo::DumpTables
Earlier, it was said that you could invoke any managed process in the profile environment. That's not completely true. Here, due to the implementation using wprintf, only Console applications will properly display the output of the profiler. The output from a managed application looks like Listing 5.
Listing 5Output from the GCP
This is debugging information CLR Profiler General Code Profiler Sample AppDomain ID NOT FOUND in AppDomain Table *** THREAD TABLE *** THREAD ID: 0x0013eb88 HANDLE: 0x000000ac WIN32 ID: 0x00000890 THREAD ID: 0x0013ef18 HANDLE: 0x000000d4 WIN32 ID: 0x000007bc *** APP_DOMAIN TABLE *** APPDOMAIN ID: 0x60c3c568 NAME: mscorlib.dll PROCESS ID: 0x00000788 APPDOMAIN ID: 0x00139de0 NAME: DefaultDomain PROCESS ID: 0x00000788 APPDOMAIN ID: 0x60c3d9b8 NAME: EE Shared Assembly Repository PROCESS ID: 0x00000788 *** ASSEMBLY TABLE *** ASSEMBLY ID: 0x00142e78 NAME: mscorlib MODULE ID: 0x00143438 APPDOMAIN ID: 0x60c3d9b8 ASSEMBLY ID: 0x0014e838 NAME: TraceSwitch MODULE ID: 0x0014e978 APPDOMAIN ID: 0x00139de0 ASSEMBLY ID: 0x00151880 NAME: System MODULE ID: 0x00152200 APPDOMAIN ID: 0x00139de0 ASSEMBLY ID: 0x001644a8 NAME: System.Xml MODULE ID: 0x00157740 APPDOMAIN ID: 0x00139de0 ASSEMBLY ID: 0x0016b298 NAME: System.Web MODULE ID: 0x00169dd0 APPDOMAIN ID: 0x00139de0 *** MODULE TABLE *** . . . *** CLASS TABLE *** . . . CLASS ID: 0x003750a0 NAME: TraceSwitchTest.TraceSwitchMain TOKEN: 0x02000002 MODULE ID: 0x0014e978 . . . *** FUNCTION TABLE *** . . . FUNCTION ID: 0x00375078 NAME: TraceSwitchTest.TraceSwitchMain::WriteDebugOutput TOKEN: 0x06000001 CLASS ID: 0x003750a0 MODULE ID: 0x0014e978 CODE SIZE: 40 bytes START ADDRESS: 0x03641bf8 Dumping Enter-Leave event counters Enter Counter: 1 Left Counter: 1 TailCall Counter: 0 Unwind Counter: 0 . . .
This profiler builds a list based on each event called. Other than just confirming that certain classes and functions were instantiated and executed, this sample does not do much. In Listing 5, the profiler is used to profile some code, TraceSwitch. As you can see from the output, the method WriteDebugOutput was called once and exited once. In addition, the address of the function is recorded. None of the other profiling tools that have been introduced thus far can give you this kind of information about a managed process. Until profilers for managed code are commercially available, these profiling APIs might be your only choice.
Windows General Code Profiler
Many of the principles that were developed in the previous section were used to create a more user-friendly profiler by providing a graphical user interface for profiling. This profiler has been named WinGCP because it takes much of the information provided by the GCP profiler presented earlier and formats it into a more presentable Windows UI.
To use the profiling API, you have to profile the .NET application completely devoid of managed code. A user interface could have been developed using MFC, but it would have been a difficult task. A method was needed to completely separate the UI from the function. One way of accomplishing this is to separate the UI and the profiler with a socket connection.
Profiling proceeds as with the GCP described earlier. When the process that is being profiled finishes, the GCP dumps all of the information out to the console or file. With WinGCP, you can dump the information in an XML format to a memory stream, open up a socket, and output the data to the UI server. Listing 6 captures the important points about the data transfer from the client (the profiler) to the server (the UI).
Listing 6Output and Connection from WinGCP
std::wostringstream out; out << L"<?xml version=\"1.0\"?>" << std::endl; out << L"<profile>" << std::endl; // dump Thread table for(ConstantThreadIterator it = m_threadTable.begin(); it != m_threadTable.end(); ++it) { it->second.Dump(out); } // dump AppDomain table for(ConstantAppDomainIterator it = m_appDomainTable.begin(); it != m_appDomainTable.end(); ++it) { it->second.Dump(out); } . . . SOCKET socket = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, 0); if (socket == INVALID_SOCKET) { return; } std::string server; char *lpserver = getenv("COR_PROFILER_SERVER"); if(lpserver == NULL) server = "localhost"; else server = lpserver; SOCKADDR_IN sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = inet_addr(server.c_str()); if(sin.sin_addr.s_addr == INADDR_NONE) { LPHOSTENT lpHost = ::gethostbyname(server.c_str()); if(lpHost != NULL) { sin.sin_addr.s_addr = ((LPIN_ADDR)lpHost->h_addr)->s_addr; } else { ::closesocket(socket); socket = INVALID_SOCKET; return; } } char *lpport = getenv("COR_PROFILER_SERVER_PORT"); int port = 8085; if(lpserver != NULL) port = atoi(lpport); sin.sin_port = htons((u_short)port); if (::WSAConnect(socket, (LPSOCKADDR) &sin, sizeof (SOCKADDR_IN), NULL, NULL, NULL, NULL) != 0) { ::closesocket(socket); socket = INVALID_SOCKET; return; } // Send the date long lTotalBytes; long nBytes = lTotalBytes = ::send(socket, (LPCSTR)out.str().c_str(), out.str().length() * sizeof(wchar_t), 0); while(lTotalBytes < out.str().length()*sizeof(wchar_t)) { nBytes = ::send(socket, (LPCSTR)out.str().c_str() + lTotalBytes, (out.str().length() * sizeof(wchar_t)) - lTotalBytes, 0); lTotalBytes += nBytes; } ::closesocket(socket); socket = INVALID_SOCKET;
When the managed process has exited, the information is transferred from the client (profiler) to the server (the UI). The first part of Listing 6 shows the beginning of formatting the output as XML. Following the dump of the profile statistics to an XML stream, the server and the server port are found using the environment variables COR_PROFILER_SERVER and COR_PROFILER_SERVER_PORT. Then a connection is created to the server, and the profile statistics are written out to the server.
Listing 7 shows how the server or user interface portion of WinGCP sets up and starts the client process.
Listing 7WinGCP Server Setting Up and Starting the Profile Client
ProcessStartInfo psi; if(arguments.Text.Length > 0) psi = new ProcessStartInfo(pathToFile.Text, arguments.Text); else psi = new ProcessStartInfo(pathToFile.Text); psi.EnvironmentVariables.Add("DBG_PRF_LOG", "1"); psi.EnvironmentVariables.Add("Cor_Enable_Profiling", "1"); psi.EnvironmentVariables.Add("COR_PROFILER", "{01568439-E2BA-4434-8ACC-816239E8B8B5}"); // psi.EnvironmentVariables.Add("COR_PROFILER_SERVER_PORT", "8085"); // psi.EnvironmentVariables.Add("COR_PROFILER_SERVER", "localhost"); psi.UseShellExecute = false; int pos = pathToFile.Text.LastIndexOf("\\"); psi.WorkingDirectory = pathToFile.Text.Substring(0, pos); Process p = Process.Start(psi);
Notice that this is the only part of the application that is dependent on the server and the client both residing on the same machine. With a little imagination, you could easily adapt this code using remoting so that the client profiler could run on virtually any machine. The server just listens for the results back from the client (see Listing 8).
Listing 8WinGCP Server Listening for Results
listeningClient = new TcpListener(8085); listeningClient.Start(); try { bool continueProcessing = true; while(continueProcessing) { Socket s = listeningClient.AcceptSocket(); NetworkStream ns = new NetworkStream(s); XmlTextReader channel = new XmlTextReader(ns); profileInfoInterface.ProcessData(channel); } } catch(Exception e) { // SocketException when listeningClient.Stop is called // during an AcceptSocket. // "A blocking operation was interrupted by a call to // WSACancelBlockingCall" Debug.WriteLine(e); } finally { listeningClient.Stop(); }
In this thread, the server listens for the data coming from the client, decodes the XML stream, and then displays the results to the user in the dialog box. By separating the presentation from the data, WinGCP enhances the original GCP visually. In addition, the original GCP can now be used to profile non-console applications (Windows apps).
The cracking of the XML message and the display of the results are not shown for the purpose of keeping this presentation as compact as possible. The complete source for this application is in the WinGCP directory.
Finding Frequently Accessed Code with the Hot Spots Tracker
As a final example of using the profiling API, this section will discuss the Hot Spots Tracker (HST) that is part of the Framework SDK. You can find the original source in \Program Files\Microsoft Visual Studio .NET\FrameworkSDK\Tool Developers Guide\Samples\profiler\hst_profiler or see the slightly modified version in the hst directory of the samples for this book. This profiling tool illustrates how changing some of the events to which the profiling tool listens can greatly affect the profiler implementation.
The HST supports even fewer flags than GCP did, as evidenced in Listing 9.
Listing 9Flags for the HST Profiler
m_dwEventMask = (DWORD)(COR_PRF_MONITOR_THREADS | COR_PRF_DISABLE_INLINING | COR_PRF_MONITOR_SUSPENDS | COR_PRF_MONITOR_ENTERLEAVE | COR_PRF_MONITOR_EXCEPTIONS | COR_PRF_MONITOR_CODE_TRANSITIONS);
In many ways, this profiler provides more useful profiling information than the GCP profiler. The HST tracks the time spent in each function. When a function is entered, the Enter method is called. The Enter method looks like Listing 10.
Listing 10HST Implementation for the Enter Method
void ProfilerCallback::Enter( FunctionID functionID ) { /////////////////////////////////////////////////////////////////////////// TimeTracker timer( static_cast<PrfInfo *>( g_pCallbackObject ) ); /////////////////////////////////////////////////////////////////////////// try { g_pCallbackObject->UpdateCallStack( functionID, PUSH ); } catch ( BaseException *exception ) { exception->ReportFailure(); delete exception; g_pCallbackObject->Failure(); } } // ProfilerCallback::Enter /* public */ void ProfilerCallback::Leave( FunctionID functionID ) { /////////////////////////////////////////////////////////////////////////// TimeTracker timer( static_cast<PrfInfo *>( g_pCallbackObject ) ); /////////////////////////////////////////////////////////////////////////// try { g_pCallbackObject->UpdateCallStack( functionID, POP ); } catch ( BaseException *exception ) { exception->ReportFailure(); delete exception; g_pCallbackObject->Failure(); } } // ProfilerCallback::Leave
The Enter callback method calls UpateCallStack with a PUSH, and the Leave callback method calls UpdateCallStack with a POP. The TimerTracker simply adds a time stamp to the class so that it can be calculated later how long this function was executed.
This profiler also has a DumpTables method, which is implemented in Listing 11.
Listing 11HST Implementation of DumpTables
void PrfInfo::DumpTables() { // // Dump the thread table if you actually performed profiling // if ( BASEHELPER::FetchEnvironment( LOG_ENVIRONMENT ) != 0xFF /* don't log anything */ ) { if ( (m_pThreadTable != NULL) && (m_dwEventMask != (DWORD)COR_PRF_MONITOR_NONE) ) { LOG_TO_FILE( ("Thread ID;Function;Times Called;Exclusive Time;Inclusive Time;Callee Time;Suspended Time;Profiler Time\n\n") ) m_pThreadTable->Dump(); } } } // PrfInfo::DumpTables
This function prints a header and calls the Dump method for the Thread table, which is reproduced in Listing 12.
Listing 12HST Implementation of Dump
void FunctionTimingInfo::Dump() { HRESULT hr; ULONG argCount = 0; BOOL bIsStatic = FALSE; WCHAR functionName[MAX_LENGTH]; WCHAR returnTypeStr[MAX_LENGTH]; WCHAR functionParameters[10 * MAX_LENGTH]; // // The intention is to dump the data into a semi-colon delimited list. The // data can then be imported into a spreadsheet and analyzed. The format // is essentially comprised of // // thread ID // function name and parameters // times called // exclusive time // inclusive time // callee time // suspended time // profiler time // hr = BASEHELPER::GetFunctionProperties( g_pPrfInfo->m_pProfilerInfo, m_id, &bIsStatic, &argCount, returnTypeStr, functionParameters, functionName ); if ( SUCCEEDED( hr ) ) { // // Dump thread ID, return type, function name, and function // parameters; note, function parameters are separated by a // '+' sign. // LOG_TO_FILE( ("0x%08x;", m_win32ThreadID) ) if ( bIsStatic == TRUE ) LOG_TO_FILE( ("static ") ) if ( returnTypeStr[0] != NULL ) LOG_TO_FILE( ("%S ", returnTypeStr) ) if ( functionName[0] != NULL ) LOG_TO_FILE( ("%S(", functionName) ) if ( argCount > 0 ) { WCHAR *parameter; WCHAR *separator = L"+"; // // Parse and dump parameters // parameter = wcstok( functionParameters, separator ); while ( parameter != NULL ) { LOG_TO_FILE( (" %S", parameter) ) parameter = wcstok( NULL, separator ); if ( parameter != NULL ) LOG_TO_FILE( (",") ) } // while } LOG_TO_FILE( (" );") ) // // Dump statistics // double exclusiveTime; // to compute exclusive time for the function, subtract the callee // time, suspended time, and profiler time from the inclusive time exclusiveTime = ( ((double)m_inclusiveTime.QuadPart / (double)g_frequency.QuadPart) - ( ((double)m_calleeTime.QuadPart / (double)g_frequency.QuadPart) + ((double)m_suspendedTime.QuadPart / (double)g_frequency.QuadPart) + ((double)m_profilerTime.QuadPart / (double)g_frequency.QuadPart) ) ); LOG_TO_FILE( ("%d;%f;%f;%f;%f;%f\n", m_timesCalled, exclusiveTime, ((double)m_inclusiveTime.QuadPart / (double)g_frequency.QuadPart), ((double)m_calleeTime.QuadPart / (double)g_frequency.QuadPart), ((double)m_suspendedTime.QuadPart / (double)g_frequency.QuadPart), ((double)m_profilerTime.QuadPart / (double)g_frequency.QuadPart)) ) } else LOG_TO_FILE( ("Unable to Retreive Information about the Function Name, Parameters and Return Type\n") ) } // FunctionTimingInfo::Dump
The HST profiler can be used with any managed process because the output is written to a file. The LOG_TO_FILE macro determines how the profile information is output.
#define LOG_TO_FILE( message ) BASEHELPER::LogToFile message;
This function opens the file output.log.
stream = ((count == 1) ? fopen( "output.log", "w" ) : fopen( "output.log", "a+" ));
When you use this profiler, you get output that looks like Listing 13.
Listing 13HST Output
Thread ID;Function;Times Called;Exclusive Time;Inclusive Time;Callee Time; Suspended Time;Profiler Time 0x000009c8;UNMANAGED FRAME( );1542;115.476832;162.039139;45.721695;0.000000; 0.840613 0x000009c8;System.ComponentModel.EventHandlerListSystem.ComponentModel. Component::get_Events( );87;0.000835;0.022671;0.001297;0.000000; 0.020539 0x000009c8;System.ComponentModel.ISite System.ComponentModel.Component:: get_Site( );30;0.000251;0.010837;0.000000;0.000000;0.010586 0x000009c8;void System.ComponentModel.Component::Dispose( );4;0.003515; 0.887392;0.882895;0.000000;0.000982 0x000009c8;void System.ComponentModel.Component::Dispose( bool );4;0.000172; 0.003737;0.001866;0.000000;0.001699 0x000009c8;bool System.ComponentModel.Component::get_DesignMode( );7;0.000073; 0.021668;0.000000;0.000000;0.021595 . . . 0x000009c8;void ThreadPoolTest.ThreadPoolTestForm::InitializeComponent( ); 1;0.017098;2.809657;2.792239;0.000000;0.000320 0x000009c8;static void ThreadPoolTest.ThreadPoolTestForm::Main( ); 1;0.006464;158.332578;158.325792;0.000000;0.000322 0x000009c8;void ThreadPoolTest.ThreadPoolTestForm::OnStart( Object ); 1;0.004145;0.306971;0.302464;0.000000;0.000361 . . . 0x000009c8;void System.Threading.Thread::.ctor( System.Threading.ThreadStart ); 1;0.000217;0.000777;0.000215;0.000000;0.000345 0x000009c8;void System.Threading.Thread::Start( ); 1;0.002435;0.015568;0.012785;0.000000;0.000348
Because this file is neatly delimited by semi-colons, you can import it into an Excel spreadsheet and sort based on columns. This was done in Figure 1.
Figure 1 Importing HST output data into an Excel spreadsheet.