Home > Store

OpenCL Programming Guide

Register your product to gain access to bonus material or receive a coupon.

OpenCL Programming Guide

eBook (Watermarked)

  • Your Price: $38.39
  • List Price: $47.99
  • Includes EPUB, MOBI, and PDF
  • About eBook Formats
  • This eBook includes the following formats, accessible from your Account page after purchase:

    ePub EPUB The open industry format known for its reflowable content and usability on supported mobile devices.

    MOBI MOBI The eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

    Adobe Reader PDF The popular standard, used most often with the free Adobe® Reader® software.

    This eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours.


  • Copyright 2011
  • Dimensions: 7" x 9-1/8"
  • Edition: 1st
  • eBook (Watermarked)
  • ISBN-10: 0-13-284889-9
  • ISBN-13: 978-0-13-284889-3

Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.

Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.

Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes

  • Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
  • Programming with OpenCL C and the runtime API
  • Using buffers, sub-buffers, images, samplers, and events
  • Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
  • Simplifying development with the C++ Wrapper API
  • Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
  • Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
  • Source code for this book is available at https://code.google.com/p/opencl-book-samples/

Sample Content

Table of Contents

Figures xv

Tables xxi

Listings xxv

Foreword xxix

Preface xxxiii

Acknowledgments xli

About the Authors xliii

Part I: The OpenCL 1.1 Language and API 1

Chapter 1: An Introduction to OpenCL 3

What Is OpenCL, or . . . Why You Need This Book 3

Our Many-Core Future: Heterogeneous Platforms 4

Software in a Many-Core World 7

Conceptual Foundations of OpenCL 11

OpenCL and Graphics 29

The Contents of OpenCL 30

The Embedded Profile 35

Learning OpenCL 36

Chapter 2: HelloWorld: An OpenCL Example 39

Building the Examples 40

HelloWorld Example 45

Checking for Errors in OpenCL 57

Chapter 3: Platforms, Contexts, and Devices 63

OpenCL Platforms 63

OpenCL Devices 68

OpenCL Contexts 83

Chapter 4: Programming with OpenCL C 97

Writing a Data-Parallel Kernel Using OpenCL C 97

Scalar Data Types 99

Vector Data Types 102

Other Data Types 108

Derived Types 109

Implicit Type Conversions 110

Explicit Casts 116

Explicit Conversions 117

Reinterpreting Data as Another Type 121

Vector Operators 123

Qualifiers 133

Keywords 141

Preprocessor Directives and Macros 141

Restrictions 146

Chapter 5: OpenCL C Built-In Functions 149

Work-Item Functions 150

Math Functions 153

Integer Functions 168

Common Functions 172

Geometric Functions 175

Relational Functions 175

Vector Data Load and Store Functions 181

Synchronization Functions 190

Async Copy and Prefetch Functions 191

Atomic Functions 195

Miscellaneous Vector Functions 199

Image Read and Write Functions 201

Chapter 6: Programs and Kernels 217

Program and Kernel Object Overview 217

Program Objects 218

Kernel Objects 237

Chapter 7: Buffers and Sub-Buffers 247

Memory Objects, Buffers, and Sub-Buffers Overview 247

Creating Buffers and Sub-Buffers 249

Querying Buffers and Sub-Buffers 257

Reading, Writing, and Copying Buffers and Sub-Buffers 259

Mapping Buffers and Sub-Buffers 276

Chapter 8: Images and Samplers 281

Image and Sampler Object Overview 281

Creating Image Objects 283

Creating Sampler Objects 292

OpenCL C Functions for Working with Images 295

Transferring Image Objects 299

Chapter 9: Events 309

Commands, Queues, and Events Overview 309

Events and Command-Queues 311

Event Objects 317

Generating Events on the Host 321

Events Impacting Execution on the Host 322

Using Events for Profiling 327

Events Inside Kernels 332

Events from Outside OpenCL 333

Chapter 10: Interoperability with OpenGL 335

OpenCL/OpenGL Sharing Overview 335

Querying for the OpenGL Sharing Extension 336

Initializing an OpenCL Context for OpenGL Interoperability 338

Creating OpenCL Buffers from OpenGL Buffers 339

Creating OpenCL Image Objects from OpenGL Textures 344

Querying Information about OpenGL Objects 347

Synchronization between OpenGL and OpenCL 348

Chapter 11: Interoperability with Direct3D 353

Direct3D/OpenCL Sharing Overview 353

Initializing an OpenCL Context for Direct3D Interoperability 354

Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357

Acquiring and Releasing Direct3D Objects in OpenCL 361

Processing a Direct3D Texture in OpenCL 363

Processing D3D Vertex Data in OpenCL 366

Chapter 12: C++ Wrapper API 369

C++ Wrapper API Overview 369

C++ Wrapper API Exceptions 371

Vector Add Example Using the C++ Wrapper API 374

Chapter 13: OpenCL Embedded Profile 383

OpenCL Profile Overview 383

64-Bit Integers 385

Images 386

Built-In Atomic Functions 387

Mandated Minimum Single-Precision Floating-Point Capabilities 387

Determining the Profile Supported by a Device in an OpenCL C Program 390

Part II: OpenCL 1.1 Case Studies 391

Chapter 14: Image Histogram 393

Computing an Image Histogram 393

Parallelizing the Image Histogram 395

Additional Optimizations to the Parallel Image Histogram 400

Computing Histograms with Half-Float or Float Values for Each Channel 403

Chapter 15: Sobel Edge Detection Filter 407

What Is a Sobel Edge Detection Filter? 407

Implementing the Sobel Filter as an OpenCL Kernel 407

Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411

Graph Data Structures 412

Kernels 414

Leveraging Multiple Compute Devices 417

Chapter 17: Cloth Simulation in the Bullet Physics SDK 425

An Introduction to Cloth Simulation 425

Simulating the Soft Body 429

Executing the Simulation on the CPU 431

Changes Necessary for Basic GPU Execution 432

Two-Layered Batching 438

Optimizing for SIMD Computation and Local Memory 441

Adding OpenGL Interoperation 446

Chapter 18: Simulating the Ocean with Fast Fourier Transform 449

An Overview of the Ocean Application 450

Phillips Spectrum Generation 453

An OpenCL Discrete Fourier Transform 457

A Closer Look at the FFT Kernel 463

A Closer Look at the Transpose Kernel 467

Chapter 19: Optical Flow 469

Optical Flow Problem Overview 469

Sub-Pixel Accuracy with Hardware Linear Interpolation 480

Application of the Texture Cache 480

Using Local Memory 481

Early Exit and Hardware Scheduling 483

Efficient Visualization with OpenGL Interop 483

Performance 484

Chapter 20: Using OpenCL with PyOpenCL 487

Introducing PyOpenCL 487

Running the PyImageFilter2D Example 488

PyImageFilter2D Code 488

Context and Command-Queue Creation 492

Loading to an Image Object 493

Creating and Building a Program 494

Setting Kernel Arguments and Executing a Kernel 495

Reading the Results 496

Chapter 21: Matrix Multiplication with OpenCL 499

The Basic Matrix Multiplication Algorithm 499

A Direct Translation into OpenCL 501

Increasing the Amount of Work per Kernel 506

Optimizing Memory Movement: Local Memory 509

Performance Results and Optimizing the Original CPU Code 511

Chapter 22: Sparse Matrix-Vector Multiplication 515

Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515

Description of This Implementation 518

Tiled and Packetized Sparse Matrix Representation 519

Header Structure 522

Tiled and Packetized Sparse Matrix Design Considerations 523

Optional Team Information 524

Tested Hardware Devices and Results 524

Additional Areas of Optimization 538

Appendix: Summary of OpenCL 1.1 541

The OpenCL Platform Layer 541

The OpenCL Runtime 543

Buffer Objects 544

Program Objects 546

Kernel and Event Objects 547

Supported Data Types 550

Vector Component Addressing 552

Preprocessor Directives and Macros 555

Specify Type Attributes 555

Math Constants 556

Work-Item Built-In Functions 557

Integer Built-In Functions 557

Common Built-In Functions 559

Math Built-In Functions 560

Geometric Built-In Functions 563

Relational Built-In Functions 564

Vector Data Load/Store Functions 567

Atomic Functions 568

Async Copies and Prefetch Functions 570

Synchronization, Explicit Memory Fence 570

Miscellaneous Vector Built-In Functions 571

Image Read and Write Built-In Functions 572

Image Objects 573

Image Formats 576

Access Qualifiers 576

Sampler Objects 576

Sampler Declaration Fields 577

OpenCL Device Architecture Diagram 577

OpenCL/OpenGL Sharing APIs 577

OpenCL/Direct3D 10 Sharing APIs 579

Index 581


Submit Errata

More Information

Unlimited one-month access with your purchase
Free Safari Membership