A Summary of Peculiarities in various OpenCL Implementations

Apple, CPU

Only allows one work item per work group. (mapping to one thread per CPU)

AMD, CPU

Unlike Apple's CPU implementation, AMD does allow multiple work items in a work group on the CPU. It does not appear as if that mapping is particularly efficient, but details aren't yet known.

AMD, 4xxx-generation GPUs

If barrier() is used, work group sizes cannot exceed 64 items.

Nvidia, GPU

The hardware is capable of binding samplers to linear chunks of memory to enable an extra layer of caching. This functionality is not available from OpenCL. (Note that this is less relevant on Fermi-class chips, which have more caches in all data paths.)

OpenCLOddities (last edited 2010-08-11 16:17:37 by AndreasKloeckner)