OpenCL Programming Guide by Aaftab MunshiOpenCL Programming Guide by Aaftab Munshi

OpenCL Programming Guide

byAaftab Munshi, Benedict Gaster, Timothy G. Mattson

Paperback | July 13, 2011

Pricing and Purchase Info

$70.14 online 
$74.99 list price save 6%
Earn 351 plum® points

Prices and offers may vary in store


In stock online

Ships free on orders over $25

Not available in stores


Using the new OpenCL (Open Computing Language) standard, you can write applications that access all available programming resources: CPUs, GPUs, and other processors such as DSPs and the Cell/B.E. processor. Already implemented by Apple, AMD, Intel, IBM, NVIDIA, and other leaders, OpenCL has outstanding potential for PCs, servers, handheld/embedded devices, high performance computing, and even cloud systems. This is the first comprehensive, authoritative, and practical guide to OpenCL 1.1 specifically for working developers and software architects.


Written by five leading OpenCL authorities, OpenCL Programming Guide covers the entire specification. It reviews key use cases, shows how OpenCL can express a wide range of parallel algorithms, and offers complete reference material on both the API and OpenCL C programming language.


Through complete case studies and downloadable code examples, the authors show how to write complex parallel programs that decompose workloads across many different devices. They also present all the essentials of OpenCL software performance optimization, including probing and adapting to hardware. Coverage includes


  • Understanding OpenCL’s architecture, concepts, terminology, goals, and rationale
  • Programming with OpenCL C and the runtime API
  • Using buffers, sub-buffers, images, samplers, and events
  • Sharing and synchronizing data with OpenGL and Microsoft’s Direct3D
  • Simplifying development with the C++ Wrapper API
  • Using OpenCL Embedded Profiles to support devices ranging from cellphones to supercomputer nodes
  • Case studies dealing with physics simulation; image and signal processing, such as image histograms, edge detection filters, Fast Fourier Transforms, and optical flow; math libraries, such as matrix multiplication and high-performance sparse matrix multiplication; and more
  • Source code for this book is available at
Aaftab Munshi is the spec editor for the OpenGL ES 1.1, OpenGL ES 2.0, and OpenCL specifications and coauthor of the book OpenGL ES 2.0 Programming Guide (with Dan Ginsburg and Dave Shreiner, published by Addison-Wesley, 2008). He currently works at Apple.   Benedict R. Gaster is a software architect working on programming models fo...
Title:OpenCL Programming GuideFormat:PaperbackDimensions:648 pages, 9.1 × 7 × 1.5 inPublished:July 13, 2011Publisher:Pearson EducationLanguage:English

The following ISBNs are associated with this title:

ISBN - 10:0321749642

ISBN - 13:9780321749642


Table of Contents

Figures xv

Tables xxi

Listings xxv

Foreword xxix

Preface xxxiii

Acknowledgments xli

About the Authors xliii


Part I: The OpenCL 1.1 Language and API 1


Chapter 1: An Introduction to OpenCL 3

What Is OpenCL, or . . . Why You Need This Book 3

Our Many-Core Future: Heterogeneous Platforms 4

Software in a Many-Core World 7

Conceptual Foundations of OpenCL 11

OpenCL and Graphics 29

The Contents of OpenCL 30

The Embedded Profile 35

Learning OpenCL 36


Chapter 2: HelloWorld: An OpenCL Example 39

Building the Examples 40

HelloWorld Example 45

Checking for Errors in OpenCL 57


Chapter 3: Platforms, Contexts, and Devices 63

OpenCL Platforms 63

OpenCL Devices 68

OpenCL Contexts 83


Chapter 4: Programming with OpenCL C 97

Writing a Data-Parallel Kernel Using OpenCL C 97

Scalar Data Types 99

Vector Data Types 102

Other Data Types 108

Derived Types 109

Implicit Type Conversions 110

Explicit Casts 116

Explicit Conversions 117

Reinterpreting Data as Another Type 121

Vector Operators 123

Qualifiers 133

Keywords 141

Preprocessor Directives and Macros 141

Restrictions 146


Chapter 5: OpenCL C Built-In Functions 149

Work-Item Functions 150

Math Functions 153

Integer Functions 168

Common Functions 172

Geometric Functions 175

Relational Functions 175

Vector Data Load and Store Functions 181

Synchronization Functions 190

Async Copy and Prefetch Functions 191

Atomic Functions 195

Miscellaneous Vector Functions 199

Image Read and Write Functions 201


Chapter 6: Programs and Kernels 217

Program and Kernel Object Overview 217

Program Objects 218

Kernel Objects 237


Chapter 7: Buffers and Sub-Buffers 247

Memory Objects, Buffers, and Sub-Buffers Overview 247

Creating Buffers and Sub-Buffers 249

Querying Buffers and Sub-Buffers 257

Reading, Writing, and Copying Buffers and Sub-Buffers 259

Mapping Buffers and Sub-Buffers 276


Chapter 8: Images and Samplers 281

Image and Sampler Object Overview 281

Creating Image Objects 283

Creating Sampler Objects 292

OpenCL C Functions for Working with Images 295

Transferring Image Objects 299


Chapter 9: Events 309

Commands, Queues, and Events Overview 309

Events and Command-Queues 311

Event Objects 317

Generating Events on the Host 321

Events Impacting Execution on the Host 322

Using Events for Profiling 327

Events Inside Kernels 332

Events from Outside OpenCL 333


Chapter 10: Interoperability with OpenGL 335

OpenCL/OpenGL Sharing Overview 335

Querying for the OpenGL Sharing Extension 336

Initializing an OpenCL Context for OpenGL Interoperability 338

Creating OpenCL Buffers from OpenGL Buffers 339

Creating OpenCL Image Objects from OpenGL Textures 344

Querying Information about OpenGL Objects 347

Synchronization between OpenGL and OpenCL 348


Chapter 11: Interoperability with Direct3D 353

Direct3D/OpenCL Sharing Overview 353

Initializing an OpenCL Context for Direct3D Interoperability 354

Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357

Acquiring and Releasing Direct3D Objects in OpenCL 361

Processing a Direct3D Texture in OpenCL 363

Processing D3D Vertex Data in OpenCL 366


Chapter 12: C++ Wrapper API 369

C++ Wrapper API Overview 369

C++ Wrapper API Exceptions 371

Vector Add Example Using the C++ Wrapper API 374


Chapter 13: OpenCL Embedded Profile 383

OpenCL Profile Overview 383

64-Bit Integers 385

Images 386

Built-In Atomic Functions 387

Mandated Minimum Single-Precision Floating-Point Capabilities 387

Determining the Profile Supported by a Device in an OpenCL C Program 390


Part II: OpenCL 1.1 Case Studies 391


Chapter 14: Image Histogram 393

Computing an Image Histogram 393

Parallelizing the Image Histogram 395

Additional Optimizations to the Parallel Image Histogram 400

Computing Histograms with Half-Float or Float Values for Each Channel 403


Chapter 15: Sobel Edge Detection Filter 407

What Is a Sobel Edge Detection Filter? 407

Implementing the Sobel Filter as an OpenCL Kernel 407


Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411

Graph Data Structures 412

Kernels 414

Leveraging Multiple Compute Devices 417


Chapter 17: Cloth Simulation in the Bullet Physics SDK 425

An Introduction to Cloth Simulation 425

Simulating the Soft Body 429

Executing the Simulation on the CPU 431

Changes Necessary for Basic GPU Execution 432

Two-Layered Batching 438

Optimizing for SIMD Computation and Local Memory 441

Adding OpenGL Interoperation 446


Chapter 18: Simulating the Ocean with Fast Fourier Transform 449

An Overview of the Ocean Application 450

Phillips Spectrum Generation 453

An OpenCL Discrete Fourier Transform 457

A Closer Look at the FFT Kernel 463

A Closer Look at the Transpose Kernel 467


Chapter 19: Optical Flow 469

Optical Flow Problem Overview 469

Sub-Pixel Accuracy with Hardware Linear Interpolation 480

Application of the Texture Cache 480

Using Local Memory 481

Early Exit and Hardware Scheduling 483

Efficient Visualization with OpenGL Interop 483

Performance 484


Chapter 20: Using OpenCL with PyOpenCL 487

Introducing PyOpenCL 487

Running the PyImageFilter2D Example 488

PyImageFilter2D Code 488

Context and Command-Queue Creation 492

Loading to an Image Object 493

Creating and Building a Program 494

Setting Kernel Arguments and Executing a Kernel 495

Reading the Results 496


Chapter 21: Matrix Multiplication with OpenCL 499

The Basic Matrix Multiplication Algorithm 499

A Direct Translation into OpenCL 501

Increasing the Amount of Work per Kernel 506

Optimizing Memory Movement: Local Memory 509

Performance Results and Optimizing the Original CPU Code 511


Chapter 22: Sparse Matrix-Vector Multiplication 515

Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515

Description of This Implementation 518

Tiled and Packetized Sparse Matrix Representation 519

Header Structure 522

Tiled and Packetized Sparse Matrix Design Considerations 523

Optional Team Information 524

Tested Hardware Devices and Results 524

Additional Areas of Optimization 538


Appendix: Summary of OpenCL 1.1 541

The OpenCL Platform Layer 541

The OpenCL Runtime 543

Buffer Objects 544

Program Objects 546

Kernel and Event Objects 547

Supported Data Types 550

Vector Component Addressing 552

Preprocessor Directives and Macros 555

Specify Type Attributes 555

Math Constants 556

Work-Item Built-In Functions 557

Integer Built-In Functions 557

Common Built-In Functions 559

Math Built-In Functions 560

Geometric Built-In Functions 563

Relational Built-In Functions 564

Vector Data Load/Store Functions 567

Atomic Functions 568

Async Copies and Prefetch Functions 570

Synchronization, Explicit Memory Fence 570

Miscellaneous Vector Built-In Functions 571

Image Read and Write Built-In Functions 572

Image Objects 573

Image Formats 576

Access Qualifiers 576

Sampler Objects 576

Sampler Declaration Fields 577

OpenCL Device Architecture Diagram 577

OpenCL/OpenGL Sharing APIs 577

OpenCL/Direct3D 10 Sharing APIs 579


Index 581

Editorial Reviews

“Welcome to the new world of heterogeneous parallel programming with this authoritative and accessible guide to the complete OpenCL Programming Model.”

–Professor Pat Hanrahan, Stanford University