第12章 WebCL:使用OpenCL加速Web应用 - 12.2 如何使用WebCL编程
WebCL 1.0其实就是使用JavaScript实现的OpenCL 1.2。同样的OpenCL API、语法以及运行时,无需太多行代码就能完成(JavaScript是面向对象的语言)。我们通常会将WebCL与其他类似的技术联系在一起进行比较。如果你对WebGL很不了解,也没有关系。
- 主机端(例如,Web浏览器)用来控制和执行JavaScript程序
- 设备端(例如,GPU)用来进行计算——OpenCL内核
// First check if the WebCL extension is installed at all
if (window.webcl == undefined){
alert("Unfortunately your system does not support WebCL." +
"Make sure that you have both the OpenCL dirver" +
"and the WebCL browser extension installed.");
// Get a list of available CL platforms, and another list of the
// available devices on each platform. If there are no platforms
// or no available devices on any platform, then we can conclude
// that WebCL is not available
webcl = window.webcl
var platforms = webcl.getPlatforms();
var devices = [];
for (var i in platforms){
var p = platforms[i];
devices[i] = p.getDevices();
alert("Excellent! Your system does support WebCL");
} catch(e){
alert("Unfortunately platform or device inquiry failed.");
// Setup WebCL context using the default device
var ctx = webcl.createContext();
图12.1 WebCL对象
// Find appropriate device
for (var j = 0, jl = device.length; j < jl; ++j){
var d = devices[j];
var devExts = d.getInfo(cl.DEVICE_EXTENSIONS);
var devGMem = d.getInfo(cl.DEVICE_GLOBAL_MEM_SIZE);
var devLMem = d.getInfo(cl.DEVICE_LOCAL_MEM_SIZE);
var devCompUnits = d.getInfo(cl.DEVICE_MAX_COMPUTE_UNITS);
var devHasImage = d.getInfo(cl.DEVICE_IMAGE_SUPPORT);
// select device that matches your requirements
platform = ...
device = ...
// assuming we found the best device, we can create the context
var context = webcl.createContext(platform, device);
- 命令队列
- 内存对象(数组和图像)
- 采样器对象,其描述了在内核中如何对图像进行读取
- 程序对象,其包含了一些列内核函数
- 内核对象,在内核源码中使用__kernel声明的函数,其作为真正的执行对象
- 事件对象,其用来追踪命令执行状态,以及对一个命令进行性能分析
- 命令同步对象,比如标记和栅栏
首先我们需要创建程序对象。WebCL与WebGL 1.0类似,假设可以提供一段内核源码。这样的话,Web应用就需要内嵌一个编译器。源码先从设备上进行载入,之后进行编译。和其他编译器一样,OpenCL编译器也定义了一些编译选项。
// Create the compute program from the source strings
program = context.createProgram(source);
// Build the program executable with relaxed math flag
program.build(device, "-cl-fast-relaxed-math");
} catch(err) {
throw 'Error building program:' + err + program.getBuildInfo(device, cl.PROGRAM_BUILD_LOG);
// Create the compute kernels from within the program
var kernel = program.createKernel("kernel_function_name");
就像普通函数一样,内核函数通常都会有一些参数。JavaScript会提供一些数据类型,typed arrays[2]就是用来传递不同类型的内核参数(具体参数类型见表12.1)。对于其他类型的数据,我们可以使用WebCL对象:
- WebCLBuffer和WebCLImage,可以用来包装数组
- WebCLSampler可以对图像进行采样
表12.1 setArg()中使用的webcl.type与C类型之间的关系
内核参数类型 | setArg()值的类型 | setArg()数组类型 | 注意 |
char, uchar | scalar | Uint8Array, Int8Arrary | 1 byte |
short, ushort | scalar | Uint16Array, Int16Array | 2 bytes |
int, uint | scalar | Uint32Array, Int32Array | 4 bytes |
long, ulong | scalar | Uint64Array, Int64Array | 8 bytes |
float | scalar | Float32Array | 4 bytes |
charN | vector | Int8Array for (u)charN | N = 2,3,4,8,16 |
shortN | vector | Int16Array for (u)shortN | N = 2,3,4,8,16 |
intN | vector | Int32Array for (u)intN | N = 2,3,4,8,16 |
floatN | vector | Float32Array for floatN and halfN | N = 2,3,4,8,16 |
doubleN | vector | Float64Array for (u)doubleN | N = 2,3,4,8,16 |
char, …, double * | WebCLBuffer | ||
image2d_t | WebCLImage | ||
sampler_t | WebCLSampler | ||
__local | Int32Array([size_in_bytes]) | 内核内部定义大小 |
// Create a 1D buffer
var buffer = webcl.createBuffer(flags, sizeInBytes, optional srcBuffer);
// flags:
// webcl.MEM_READ_WRITE Default. Memory object is read and written by kernel
// webcl.MEM_WRITE_ONLY Memory object only writeten by kernel
// webcl.MEM_READ_ONLY Memory object only read by kernel
// webcl.MEM_USE_HOST_PTR Implementation requests OpenCL to allocate host memory
// webcl.MEM_COPY_HOST_PTR Implementation requests OpenCL to allocate host memory and copy data from srcBuffer memory. srcBuffer must be specified
// create a 32-bit RGBA WebCLImage object
// first, we define the format of the image
var imageFormat = {
// memory layout in which pixel data channels are stored in the image
// type of the channel data
// image size
// scan-line pitch in bytes.
// If imageBuffer is null, which is the default if rowPitch is not specified.
// Image on device
// imageBuffer is a typed array that contain the image data already allocated by the application
// imageBuffer.byteLength >= rowPitch * image_height. The size of each element in bytes must be a power of 2.
var image = context.createImage(webcl.MEM_READ_ONLY | webcl.MEM_USE_HOST_PTR, imageFormat, imageBuffer);
// create a smpler object
var sampler = context.createSampler(normalizedCoords, addressingMode, filterMode);
// normalizedCoods indicates if image coordinates specified are normalized.
// addressingMode indicated how out-of-range image coordinations are handled when reading an image.
// This can be set to webcl.ADDRESS_MIRRORED_REPEAT
// webcl.ADDRESS_CLAMP and webcl.ADDRESS_NONE.
// filterMode specifies the type of filter to apply when reading an image. This can be webcl.FILTER_NEAREST or webcl.FILTER_LINEAR
// Sets value of kernel argument idx with value as memory object or sampler
// Sets value of argument 0 to the integer value 5
kernel.setArg(0, new Int32Array([5]));
// Sets value of argument 1 to the float value 1.34
kernel.setArg(1, new Float32Array([1.34]));
// Sets value of argument 2 as a 3-float vector
// buffer should be a Float32Array with 3 floats
kernel.setArg(2, new Float32Array([1.0, 2.0, 3.0]));
// Allocate 4096 bytes of local memory for argument 4
kernel.setArg(3, new Int32Array([4096]));
- 长整型是64位整数,其无法在JavaScript中表示。其只能表示成两个32位整数:低32位存储在数组的第一个元素中,高32位存储在第二个元素中。
- 如果使用__constant对内核参数进行修饰,那么其大小就不能超多webcl.DEVICE_MAX_CONSTANT_BUFFER_SIZE。
- OpenCL允许通过数组传递自定义结构体,不过为了可移植性WebCL还不支持自定义结构体的传递。其中很重要的原因是因为主机端和设备端的存储模式可能不同(大端或小端),其还需要开发者对于不同端的设备进行数据整理,即使主机和设备位于同一设备上。
- 所有WebCL API都是线程安全的,除了kernel.setArg()。不过,kernel.setArg()在被不同的内核对象并发访问的时候也是安全的。未定义的行为会发生在多个线程调用同一个WebCLKernel对象时。
// Create an in-order command-queue(default)
var queue = context.createCommandQueue(device);
// Create an in-order command-queue with profiling of commands enabled
var queue = context.createCommandQueue(device, webcl.QUEUE_PROFILING_ENABLE);
// Create an out-of-order command-queue
var queue = context.createCommandQueue(device, webcl.QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE);
- 第一部分是主机端向设备端传输数据,分解之后只需要原来拷贝时间的一半即可完成工作。然后,执行内核,基本上也是一半的时间完成。最后将数据传回主机,同样还是原先一半的时间。
- 第一个传输数据完成后,第二个才开始,有些类似于CPU的流水线。
当有一系列命令入队后,使用WebCL中的enqueueNDRange(kenrel, offsets, globals, locals)就能执行对应的命令:
- kernel——执行命令的内核对象。
- offsets——对全局区域的偏移。如果传null,则代表offsets=[0, 0, 0]。
- globals——内核所要解决问题的尺寸。
- locals——每个维度上的工作组中,工作项的数量。如果传null,设备会对其自行设定。
例如,如果我们要操作一个宽度width,高度为height大小的图像,那么globals可以设置为[width, height],locals是可以设置为[16, 16]。这里需要注意的是,如果enqueuNDRange()中locals的大小超过了webcl.KERNEL_WORK_GROUP_SIZE,enqueuNDRange()就会执行失败。
