Cheat Sheet for various Linux/HPC Tools
Originally written for a class on High Performance Computing at NYU in the fall of 2010.
Contents
Notation
ALL CAPS |
Replace with actual value--i.e. if I say "FILE_NAME", put a file name, not FILE_NAME literally. |
[optional] |
things in brackets can be omitted |
gdb
Compiling for gdb
Compiler flag -g: Add debug info. -O may reduce the usefulness of debug info.
Starting gdb
gdb PROGRAM |
|
ulimit -c unlimited |
enable core dumps |
gdb PROGRAM core |
start from core dump |
gdb PROGRAM PROCESS_ID |
attach to running process |
Using gdb
r |
run program |
bt |
backtrace |
up/down/frame N |
go up/down in call stack |
n |
step over |
s |
step into |
fin |
return from current subroutine |
Ctrl-X Ctrl-A |
switch to full-screen user interface |
b [FILE_NAME:]LINE_NUMBER [thread N] [if COND] |
set break point |
b FUNCTION_NAME [thread N] [if COND] |
|
gdb with OpenMP/threads
info threads |
show list of threads |
thread N |
switch threads |
Advance in lock-step:
define adv4 thread 1 n thread 2 n thread 3 n thread 4 n end
gdb for MPI
Insert this snippet into your program:
{
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i) sleep(5);
}You might also need these headers:
#include <unistd.h> #include <stdio.h>
Then gdb PROGRAM PID, where PID is from the output of the program. You will probably catch the program in the kernel call for sleep. Type fin until you get up to the infinite sleep loop, then say set var i = 7. Then debug as usual. You can also just execute this snippet on one misbehaving rank by adding if (rank == N) before it.
Valgrind
valgrind PROGRAM |
check for heap pointer bugs |
valgrind --leak-check=full PROGRAM |
memcheck with leak tracking |
valgrind --db-attach=yes PROGRAM |
memcheck, ask whether to drop into gdb at each error site |
valgrind --tool=cachegrind PROGRAM |
simulate cache behavior |
kcachegrind cachegrind.out.PID |
view per-function profile out of cachegrind |
valgrind --tool=callgrind --cache-sim=yes PROGRAM |
gather cache info and call graph info |
kcachegrind callgrind.out.PID |
view per-function profile out of callgrind |
GProf
cc -pg -o my-program my-program.c |
Compile with instrumentation for gprof |
./my-program |
Run the program, writes gmon.out. |
gprof ./my-pogram gmon.out [FURTHER OPTIONS] |
Examine profiler output |
OProfile
Linux-only, must be root.
Running
opcontrol --list-events |
show processor events |
sudo opcontrol --no-vmlinux --event=DCU_LINES_IN:10000 --event=INST_RETIRED:1000000 |
set-up for cache miss ratio measurement |
sudo opcontrol --no-vmlinux --callgraph=3 --event=CPU_CLK_UNHALTED:1000000 --event=INST_RETIRED:1000000 |
set-up for instructions-per-clock measurement, with call graph |
sudo opcontrol --reset |
empty sample pool |
sudo opcontrol --start |
start sampling |
sudo opcontrol --shutdown |
stop sampling |
You may need to adjust the event names according to your processor.
Intel Core2 and earlier support two events at a time, i7 and later can do four.
Viewing Profile Data
opreport |
view full-system profile summary |
opreport -l image:PROGRAM |
per-function profile for PROGRAM |
opreport -l -c image:PROGRAM |
... with call graph info (if gathered, see above) |
opannotate --source image:PROGRAM |
annotate source code with sample counts |
opannotate --assembly image:PROGRAM |
annotate assembly code with sample counts |
Intel Optimization manual (see appendix B for event descriptions)
Nvidia GPU Compute Profiler
export COMPUTE_PROFILE=1 |
enable profiler |
export COMPUTE_PROFILE_CONFIG=SOMEFILE |
set profile event selection file |
where SOMEFILE contains event names, one per line.
