MANNING Matthew Scarpino How to accelerate graphics and computation IN ACTION OpenCL in Action Download from Wow! eBook <www.wowebook.com> Download from Wow! eBook <www.wowebook.com> OpenCL in Action HOW TO ACCELERATE GRAPHICS AND COMPUTATION MATTHEW SCARPINO MANNING SHELTER ISLAND Download from Wow! eBook <www.wowebook.com> For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: orders@manning.com ©2012 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Maria Townsley 20 Baldwin Road Copyeditor: Andy Carroll PO Box 261 Proofreader: Maureen Spencer Shelter Island, NY 11964 Typesetter: Gordan Salinovic Cover designer: Marija Tudor ISBN 9781617290176 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11 Download from Wow! eBook <www.wowebook.com> v brief contents PART 1 FOUNDATIONS OF OPENCL PROGRAMMING 1 1 ■ Introducing OpenCL 3 2 ■ Host programming: fundamental data structures 16 3 ■ Host programming: data transfer and partitioning 43 4 ■ Kernel programming: data types and device memory 68 5 ■ Kernel programming: operators and functions 94 6 ■ Image processing 123 7 ■ Events, profiling, and synchronization 140 8 ■ Development with C++ 167 9 ■ Development with Java and Python 196 10 ■ General coding principles 221 PART 2 CODING PRACTICAL ALGORITHMS IN OPENCL 235 11 ■ Reduction and sorting 237 12 ■ Matrices and QR decomposition 258 13 ■ Sparse matrices 278 14 ■ Signal processing and the fast Fourier transform 295 Download from Wow! eBook <www.wowebook.com> BRIEF CONTENTS vi PART 3 ACCELERATING OPENGL WITH OPENCL 319 15 ■ Combining OpenCL and OpenGL 321 16 ■ Textures and renderbuffers 340 Download from Wow! eBook <www.wowebook.com> vii contents preface xv acknowledgments xvii about this book xix PART 1 FOUNDATIONS OF OPENCL PROGRAMMING 1 1 Introducing OpenCL 3 1.1 The dawn of OpenCL 4 1.2 Why OpenCL? 5 Portability 6 ■ Standardized vector processing 6 ■ Parallel programming 7 1.3 Analogy: OpenCL processing and a game of cards 8 1.4 A first look at an OpenCL application 10 1.5 The OpenCL standard and extensions 13 1.6 Frameworks and software development kits (SDKs) 14 1.7 Summary 14 Download from Wow! eBook <www.wowebook.com> CONTENTS viii 2 Host programming: fundamental data structures 16 2.1 Primitive data types 17 2.2 Accessing platforms 18 Creating platform structures 18 ■ Obtaining platform information 19 ■ Code example: testing platform extensions 20 2.3 Accessing installed devices 22 Creating device structures 22 ■ Obtaining device information 23 ■ Code example: testing device extensions 24 2.4 Managing devices with contexts 25 Creating contexts 26 Obtaining context information 28 ■ Contexts and the reference count 28 ■ Code example: checking a context’s reference count 29 2.5 Storing device code in programs 30 Creating programs 30 ■ Building programs 31 ■ Obtaining program information 33 ■ Code example: building a program from multiple source files 35 2.6 Packaging functions in kernels 36 Creating kernels 36 ■ Obtaining kernel information 37 Code example: obtaining kernel information 38 2.7 Collecting kernels in a command queue 39 Creating command queues 40 ■ Enqueuing kernel execution commands 40 2.8 Summary 41 3 Host programming: data transfer and partitioning 43 3.1 Setting kernel arguments 44 3.2 Buffer objects 45 Allocating buffer objects 45 ■ Creating subbuffer objects 47 3.3 Image objects 48 Creating image objects 48 ■ Obtaining information about image objects 51 3.4 Obtaining information about buffer objects 52 3.5 Memory object transfer commands 54 Read/write data transfer 54 ■ Mapping memory objects 58 Copying data between memory objects 59 Download from Wow! eBook <www.wowebook.com> CONTENTS ix 3.6 Data partitioning 62 Loops and work-items 63 ■ Work sizes and offsets 64 ■ A simple one-dimensional example 65 ■ Work-groups and compute units 65 3.7 Summary 67 4 Kernel programming: data types and device memory 68 4.1 Introducing kernel coding 69 4.2 Scalar data types 70 Accessing the double data type 71 ■ Byte order 72 4.3 Floating-point computing 73 The float data type 73 ■ The double data type 74 ■ The half data type 75 ■ Checking IEEE-754 compliance 76 4.4 Vector data types 77 Preferred vector widths 79 ■ Initializing vectors 80 ■ Reading and modifying vector components 80 ■ Endianness and memory access 84 4.5 The OpenCL device model 85 Device model analogy part 1: math students in school 85 ■ Device model analogy part 2: work-items in a device 87 ■ Address spaces in code 88 ■ Memory alignment 90 4.6 Local and private kernel arguments 90 Local arguments 91 ■ Private arguments 91 4.7 Summary 93 5 Kernel programming: operators and functions 94 5.1 Operators 95 5.2 Work-item and work-group functions 97 Dimensions and work-items 98 ■ Work-groups 99 ■ An example application 100 5.3 Data transfer operations 101 Loading and storing data of the same type 101 ■ Loading vectors from a scalar array 101 ■ Storing vectors to a scalar array 102 5.4 Floating-point functions 103 Arithmetic and rounding functions 103 ■ Comparison functions 105 ■ Exponential and logarithmic functions 106 Trigonometric functions 106 ■ Miscellaneous floating-point functions 108 Download from Wow! eBook <www.wowebook.com> [...]... OpenGL and OpenCL 322 Creating the OpenCL context 323 Sharing data between OpenGL and OpenCL 325 Synchronizing access to shared data 328 ■ ■ 15.2 Obtaining information 329 Obtaining OpenGL object and texture information information about the OpenGL context 330 15.3 Basic interoperability example 329 ■ Obtaining 331 Initializing OpenGL operation 331 Initializing OpenCL operation 331 Creating data objects... Author Online Nobody’s perfect If I failed to convey my subject material clearly or (gasp) made a mistake, feel free to add a comment through Manning s Author Online system You can find the Author Online forum for this book by going to www .manning. com/ OpenCLinAction and clicking the Author Online link Simple questions and concerns get rapid responses In contrast, if you’re unhappy with line 402 of... development Appendix C explains how to install and use the Minimalist GNU for Windows (MinGW), which provides a GNU-like environment for building executables on the Windows operating system Lastly, appendix D discusses the specification for embedded OpenCL Obtaining and compiling the example code In the end, it’s the code that matters This book contains working code for over 60 OpenCL applications, and... at www .manning. com/OpenCLinAction or www .manning. com/scarpino2/ The download site provides a link pointing to an archive that contains code intended to be compiled with GNU-based build tools This archive contains one folder for each chapter/appendix of the book, and each top-level folder has subfolders for example projects For example, if you look in the Ch5/shuffle_test directory, you’ll find the... 9.3 PyOpenCL 210 PyOpenCL installation and licensing 210 Overview of PyOpenCL development 211 Creating kernels with PyOpenCL 212 Setting arguments and executing kernels 215 ■ ■ 9.4 10 Summary ■ 219 General coding principles 221 10.1 Global size and local size 222 Finding the maximum work-group size devices 224 10.2 Numerical reduction OpenCL reduction 226 223 ■ Testing kernels and 225 ■ Improving reduction... it finds The following listing shows what this host code looks like Notice that the source code is written in the C programming language NOTE Error-checking routines have been omitted from this listing, but you’ll find them in the matvec.c file in this book’s example code Listing 1.1 Creating and distributing a matrix-vector multiplication kernel: matvec.c #define PROGRAM_FILE "matvec.cl" #define KERNEL_FUNC... Introducing OpenCL operation: hosts and kernels ■ Implementing an OpenCL application in code In October 2010, a revolution took place in the world of high-performance computing The Tianhe-1A, constructed by China’s National Supercomputing Center in Tianjin, came from total obscurity to seize the leading position among the world’s best performing supercomputers With a maximum recorded computing speed... target device In the later chapters, the focus shifts from learning how OpenCL works to putting OpenCL to use in processing vast amounts of data at high speed Audience In writing this book, I’ve assumed that readers have never heard of OpenCL and know nothing about distributed computing or high-performance computing I’ve done my best to present concepts like task-parallelism and SIMD (single instruction,... one of many working groups in the Khronos Group, a consortium of companies whose aim is to advance graphics and graphical media Since its formation, the OpenCL Working Group has released two formal specifications: OpenCL version 1.0 was released in 2008, and OpenCL version 1.1 was released in 2010 OpenCL 2.0 is planned for 2012 This section has explained why businesses think highly of OpenCL, but I wouldn’t... Gilhooley for spreading the word about the book’s publication Given OpenCL s youth, the audience isn’t as easy to reach as the audience for Manning s many Java books But between setting up web articles, presentations, and conference attendance, Candace has done an exemplary job in marketing OpenCL in Action One of Manning s greatest strengths is its reliance on constant feedback During development and . Packaging functions in kernels 36 Creating kernels 36 ■ Obtaining kernel information 37 Code example: obtaining kernel information 38 2.7 Collecting kernels. 328 15.2 Obtaining information 329 Obtaining OpenGL object and texture information 329 ■ Obtaining information about the OpenGL context 330 15.3 Basic interoperability