A Summary of Peculiarities in various OpenCL Implementations [Edit]

Apple [Edit]

Apple, Compiling outside Xcode [Edit]

Change the compiler LDFLAGS from "-lOpenCL" to "-framework OpenCL".

Apple, CPU [Edit]

Only allows one work item per work group. (mapping to one thread per CPU)

AMD, CPU [Edit]

Unlike Apple's CPU implementation, AMD does allow multiple work items in a work group on the CPU. It does not appear as if that mapping is particularly efficient, but details aren't yet known.

AMD, 4xxx-generation GPUs [Edit]

If barrier() is used, work group sizes cannot exceed 64 items.

Nvidia, GPU [Edit]

The hardware is capable of binding samplers to linear chunks of memory to enable an extra layer of caching. This functionality is not available from OpenCL. (Note that this is less relevant on Fermi-class chips, which have more caches in all data paths.)

OpenCLOddities (last edited 2011-10-28 15:53:55 by AndreasKloeckner)