SPECIAL OFFERS
Keep up with new releases and promotions. Sign up to hear from us.
Register your product to gain access to bonus material or receive a coupon.
Solaris™ Application Programming is a comprehensive guide to optimizing the performance of applications running in your Solaris environment. From the fundamentals of system performance to using analysis and optimization tools to their fullest, this wide-ranging resource shows developers and software architects how to get the most from Solaris systems and applications.
Whether you’re new to performance analysis and optimization or an experienced developer searching for the most efficient ways to solve performance issues, this practical guide gives you the background information, tips, and techniques for developing, optimizing, and debugging applications on Solaris.
The text begins with a detailed overview of the components that affect system performance. This is followed by explanations of the many developer tools included with Solaris OS and the Sun Studio compiler, and then it takes you beyond the basics with practical, real-world examples. In addition, you will learn how to use the rich set of developer tools to identify performance problems, accurately interpret output from the tools, and choose the smartest, most efficient approach to correcting specific problems and achieving maximum system performance.
Coverage includes
Preface xix
Part I: Overview of the Processor 1
Chapter 1: The Generic Processor 3
1.1 Chapter Objectives 3
1.2 The Components of a Processor 3
1.3 Clock Speed 4
1.4 Out-of-Order Processors 5
1.5 Chip Multithreading 6
1.6 Execution Pipes 7
1.7 Caches 11
1.8 Interacting with the System 14
1.9 Virtual Memory 16
1.10 Indexing and Tagging of Memory 18
1.11 Instruction Set Architecture 18
Chapter 2: The SPARC Family 21
2.1 Chapter Objectives 21
2.2 The UltraSPARC Family 21
2.3 The SPARC Instruction Set 23
2.4 32-bit and 64-bit Code 30
2.5 The UltraSPARC III Family of Processors 30
2.6 UltraSPARC T1 37
2.7 UltraSPARC T2 37
2.8 SPARC64 VI 38
Chapter 3: The x64 Family of Processors 39
3.1 Chapter Objectives 39
3.2 The x64 Family of Processors 39
3.3 The x86 Processor: CISC and RISC 40
3.4 Byte Ordering 41
3.5 Instruction Template 42
3.6 Registers 43
3.7 Instruction Set Extensions and Floating Point 46
3.8 Memory Ordering 46
Part II: Developer Tools 47
Chapter 4: Informational Tools 49
4.1 Chapter Objectives 49
4.2 Tools That Report System Configuration 49
4.3 Tools That Report Current System Status 55
4.4 Process- and Processor-Specific Tools 72
4.5 Information about Applications 84
Chapter 5: Using the Compiler 93
5.1 Chapter Objectives 93
5.2 Three Sets of Compiler Options 93
5.3 Using -xtarget=generic on x86 95
5.4 Optimization 96
5.5 Generating Debug Information 102
5.6 Selecting the Target Machine Type for an Application 103
5.7 Code Layout Optimizations 107
5.8 General Compiler Optimizations 116
5.9 Pointer Aliasing in C and C++ 123
5.10 Other C- and C++-Specific Compiler Optimizations 133
5.11 Fortran-Specific Compiler Optimizations 135
5.12 Compiler Pragmas 136
5.13 Using Pragmas in C for Finer Aliasing Control 142
5.14 Compatibility with GCC 147
Chapter 6: Floating-Point Optimization 149
6.1 Chapter Objectives 149
6.2 Floating-Point Optimization Flags 149
6.3 Floating-Point Multiply Accumulate Instructions 173
6.4 Integer Math 174
6.5 Floating-Point Parameter Passing with SPARC V8 Code 178
Chapter 7: Libraries and Linking 181
7.1 Introduction 181
7.2 Linking 181
7.3 Libraries of Interest 193
7.4 Library Calls 199
Chapter 8: Performance Profiling Tools 207
8.1 Introduction 207
8.2 The Sun Studio Performance Analyzer 207
8.3 Collecting Profiles 208
8.4 Compiling for the Performance Analyzer 210
8.5 Viewing Profiles Using the GUI 210
8.6 Caller—Callee Information 212
8.7 Using the Command-Line Tool for Performance Analysis 214
8.8 Interpreting Profiles 215
8.9 Intepreting Profiles from UltraSPARC III/IV Processors 217
8.10 Profiling Using Performance Counters 218
8.11 Interpreting Call Stacks 219
8.12 Generating Mapfiles 222
8.13 Generating Reports on Performance Using spot 223
8.14 Profiling Memory Access Patterns 226
8.15 er_kernel 233
8.16 Tail-Call Optimization and Debug 235
8.17 Gathering Profile Information Using gprof 237
8.18 Using tcov to Get Code Coverage Information 239
8.19 Using dtrace to Gather Profile and Coverage Information 241
8.20 Compiler Commentary 244
Chapter 9: Correctness and Debug 247
9.1 Introduction 247
9.2 Compile-Time Checking 248
9.3 Runtime Checking 256
9.4 Debugging Using dbx 262
9.5 Locating Optimization Bugs Using ATS 271
9.6 Debugging Using mdb 274
Part III: Optimization 277
Chapter 10: Performance Counter Metrics 279
10.1 Chapter Objectives 279
10.2 Reading the Performance Counters 279
10.3 UltraSPARC III and UltraSPARC IV Performance Counters 281
10.4 Performance Counters on the UltraSPARC IV and UltraSPARC IV+ 302
10.5 Performance Counters on the UltraSPARC T1 304
10.6 UltraSPARC T2 Performance Counters 308
10.7 SPARC64 VI Performance Counters 309
10.8 Opteron Performance Counters 310
Chapter 11: Source Code Optimizations 319
11.1 Overview 319
11.2 Traditional Optimizations 319
11.3 Data Locality, Bandwidth, and Latency 326
11.4 Data Structures 339
11.5 Thrashing 349
11.6 Reads after Writes 352
11.7 Store Queue 354
11.8 If Statements 357
11.9 File-Handling in 32-bit Applications 364
Part IV: Threading and Throughput 369
Chapter 12: Multicore, Multiprocess, Multithread 371
12.1 Introduction 371
12.2 Processes, Threads, Processors, Cores, and CMT 371
12.3 Virtualization 374
12.4 Horizontal and Vertical Scaling 375
12.5 Parallelization 376
12.6 Scaling Using Multiple Processes 378
12.7 Multithreaded Applications 385
12.8 Parallelizing Applications Using OpenMP 402
12.9 Using OpenMP Directives to Parallelize Loops 403
12.10 Using the OpenMP API 406
12.11 Parallel Sections 407
12.12 Automatic Parallelization of Applications 408
12.13 Profiling Multithreaded Applications 410
12.14 Detecting Data Races in Multithreaded Applications 412
12.15 Debugging Multithreaded Code 413
12.16 Parallelizing a Serial Application 417
Part V: Concluding Remarks 435
Chapter 13: Performance Analysis 437
13.1 Introduction 437
13.2 Algorithms and Complexity 437
13.3 Tuning Serial Code 442
13.4 Exploring Parallelism 444
13.5 Optimizing for CMT Processors 446
Index 447