INTRO

As pyopencl constitutes a python based wrapper of, resp. to, native C/C++ functions (which originally are intended to JIT compile, load and deploy individual OpenCL-C kernels), debugging of the latter whilst running in a python host app isnt as straightforward.

Certain vendor-specific tools (by AMD, Intel, nVidia, ... ) might or have to get employed in order to leverage what platform-specific features and abilities they provide before being able to suspend, inspect, modify and resume kernels on parallel HW like e.g. GPUs.

Detours into the realm of C/C++ host apps (intended to simply serve the purpose of kernel debugging) shall be spared, as shall trigger-happy shootout sessions peppering python-deployed kernels with printf-salvos to get some sort of verbose output at runtime off of our work items.

The brief explanation (/user story) below indicates a few steps req'd to get done to become able to enjoy single-step source-level PpenCL kernel-debugging, deployed by PyopenCL onto the GPGPU of a modern budget graphics card

To cut it short: You want to debug "python.exe .py" using the OpenCL capable native debugger, for instance CodeXL.

Enjoy, correct, suggest, enhance, expand, ask via PyOpenCL's mailing list :-)

BR
Kai, Frankfurt, Germany sep-2015

P.S. Installation and binaries explained below.

P.P.S. For configuration of python/pycharm and CodeXL see the ATTACHMENT (PDF) to this site (hit hyperlink in line above) !!

SUBSTANCE

_+ Motivation _

Debugging openCL Kernels in good ol' single step source level fashion running a PyOpenCL (2.7) based host application.

+ For reference my humble gear per sep 2015 (I know, quite some antiques, partially at least :>)

  1. Windows 7 64Bit Home Premium
  2. Intel Core2 Duo 8200 @ 2.66 GHz
  3. 6 GB DDR2
  4. AMD R7 260X (Graphics Core Next (GCN) 1.1 based GPU. OCL2.0 compliant, dubbed "Bonaire". Note: R7 265 is faster, yet only OpenCL "1.2" compliant for its "Pitcairn" GPU consist of an older GCN 1.0 architecture.) See https://en.wikipedia.org/wiki/AMD_Radeon_Rx_200_series about facts on GCN revisions of both R7260X and R7 265. (Note: GCN info on R7 265 at https://de.wikipedia.org/wiki/AMD-Radeon-HD-7000-Serie seems somewhat inaccurate)

+ Python Env (list below also reflects an exemplifying install order of that usual round of tool suspects which comes into play as everyone had expected)

  1. Enthought Canopy : canopy-1.5.1-win-64.msi (website, python 2.7)
  2. IDE : pycharm Community edition 4.0.3 (website)
  3. Numpy : numpy-MKL-1.9.1.win-amd64-py2.7.exe (e.g. gohlke)
  4. Boost : boost_python-1.55.win-amd64-py2.7.exe (e.g. gohlke)
  5. Prereq's 4 pyOCL : [In Pycharm under File -> Settings -> interpreter -> list of packages -> hit "+"]
    • add pytools
    • add pytest
    • add pydecorator
  6. pyOpenCL Wheel : cmd-line, "pip install pyopencl-2015.1-cp27-none-win_amd64.whl" (e.g. gohlke)
  7. Presto, u r all set.

+ For Sake of completeness... some revisions of tools/utilities

Moving on to AMDs repository of "Packages" required to get installed... 3 alternatives Ive had the pleasure of testing (so far)...

1) GPU debugging single-step src-lvl works - (newer)

  1. AMD Catalyst 15.7.1
  2. OpenCL 2.0 AMD-APP (1800.8)
  3. CodeXL Standalone (1.8.9637)

2) GPU debugging single-step src-lvl works - (lil bit older)

  1. AMD Catalyst 14.12
  2. OpenCL 2.0 AMD-APP (1642.5)
  3. CodeXL Standalone (1.6.7249)

3) CPU based openCL debugging only.

  1. AMD Catalyst 14.12
  2. OpenCL 2.0 AMD-APP (1642.5)
  3. gDEBugger Standalone (6.2.64)
    Plus, for gDEBugger in particular, you got to introduce those extra lines in python i.o.t. allow for gDEBugger to function properly:
  4. //## PYTHON ##
  5. os.environ['AMD_OCL_BUILD_OPTIONS_APPEND']="-g -O0" # Add debug info (to opencl compiler product) and ditch optimizations
  6. os.environ['CPU_MAX_COMPUTE_UNITS'] = '1' # Set count of compute unit to one (moreover, assuming/requiring program to execute on CPU but GPU!)
  7. ... # subsequently, further down in ur host appl., make sure to select the device[idx] which corresponds to your host CPU (but GPU), i.e. usually [0]
  8. //## PYTHON ##

+ Limitations

Lets reflect current (->3Q2015<-) limitations and caveats - if using CodeXL i.c.w. python + pyopencl

  1. CL build fails at start of debug session if printf (#pragma OPENCL EXTENSION cl_amd_printf: enable) is being used in kernel. Comment "//" corresponding lines.
  2. viewing local memory is not currently supported (in CodeXL watch window)
  3. debugging experience (step, watch, run, ...) not yet seamlessly transitioning between python-debug and openCL-debug. debugging is meant to get commenced python- XOR openCL-wise at a time, i.e. user's supposed to get the finishing touch on his current python app to a certain level which guarantees proper stimuli pattern on api-level before continuing on to openCL kernel-level debugging action in CodeXL subsequently. Yet, imho, being able to debug the python host code XOR openCL-kernels at a time afterall is way better than printf-based kernel peppering respectively no kernel-debug support at all. Seamlessly transitioning out from python lines of code right into openCL kernel code was nice, yet aperantly not attainable to date.

_+ Beyond _

Fiddled around with my Intel i5 & HD4000 Ivy Bridge based Laptop. Essentially the same python setup but with intel's APU drivers (they dont call it APU btw). Enthought Canopy + Pycharm + PyOpenCL + Intel's HD4000 driver do work in concernt, yes. However, despite stated on Intel's website about their "Intel Native Development Environment" (INDE) ("I") couldnt get it to the state of practically doing Kernel Level Debugging ic.w. Microsoft Visual Studio Professional on ordinary C/C++ demos Intel provides (e.g. "God Ray" Demo). Loads correctly, reports successful plugin integration, shows those opencl debug settings dialogs, lets you place breakpoints inside kernels and registers them nicely with its BP list....but seems capable of doing api level debugging only - doesnt react to inside-kernel breakpoints plainly. Maybe its a beginners issue on my side about this framework. Didnt move on to the python debugging due to unsuccessful first attempts in C.... maybe s/o succeeded in this already and were willing to share his/her insights into that here.... thx in advance