Cheat Sheet for various Linux/HPC Tools

Originally written for a class on High Performance Computing at NYU in the fall of 2010.

Notation

ALL CAPS

Replace with actual value--i.e. if I say "FILE_NAME", put a file name, not FILE_NAME literally.

[optional]

things in brackets can be omitted

gdb

Manual

Compiling for gdb

Compiler flag -g: Add debug info. -O may reduce the usefulness of debug info.

Starting gdb

gdb PROGRAM

ulimit -c unlimited

enable core dumps

gdb PROGRAM core

start from core dump

gdb PROGRAM PROCESS_ID

attach to running process

Using gdb

r

run program

bt

backtrace

up/down/frame N

go up/down in call stack

n

step over

s

step into

fin

return from current subroutine

Ctrl-X Ctrl-A

switch to full-screen user interface

b [FILE_NAME:]LINE_NUMBER [thread N] [if COND]

set break point

b FUNCTION_NAME [thread N] [if COND]

gdb with OpenMP/threads

info threads

show list of threads

thread N

switch threads

Advance in lock-step:

define adv4
  thread 1
  n
  thread 2
  n
  thread 3
  n
  thread 4
  n
end

gdb for MPI

Insert this snippet into your program:

{
  int i = 0;
  char hostname[256];
  gethostname(hostname, sizeof(hostname));
  printf("PID %d on %s ready for attach\n", getpid(), hostname);
  fflush(stdout);
  while (0 == i) sleep(5);
}

You might also need these headers:

#include <unistd.h>
#include <stdio.h>

Then gdb PROGRAM PID, where PID is from the output of the program. You will probably catch the program in the kernel call for sleep. Type fin until you get up to the infinite sleep loop, then say set var i = 7. Then debug as usual. You can also just execute this snippet on one misbehaving rank by adding if (rank == N) before it.

Source, more info

Valgrind

valgrind PROGRAM

check for heap pointer bugs

valgrind --leak-check=full PROGRAM

memcheck with leak tracking

valgrind --db-attach=yes PROGRAM

memcheck, ask whether to drop into gdb at each error site

valgrind --tool=cachegrind PROGRAM

simulate cache behavior

kcachegrind cachegrind.out.PID

view per-function profile out of cachegrind

valgrind --tool=callgrind --cache-sim=yes PROGRAM

gather cache info and call graph info

kcachegrind callgrind.out.PID

view per-function profile out of callgrind

Manual

GProf

cc -pg -o my-program my-program.c

Compile with instrumentation for gprof

./my-program

Run the program, writes gmon.out.

gprof ./my-pogram gmon.out [FURTHER OPTIONS]

Examine profiler output

Manual

OProfile

Linux-only, must be root.

Running

opcontrol --list-events

show processor events

sudo opcontrol --no-vmlinux --event=DCU_LINES_IN:10000 --event=INST_RETIRED:1000000

set-up for cache miss ratio measurement

sudo opcontrol --no-vmlinux --callgraph=3 --event=CPU_CLK_UNHALTED:1000000 --event=INST_RETIRED:1000000

set-up for instructions-per-clock measurement, with call graph

sudo opcontrol --reset

empty sample pool

sudo opcontrol --start

start sampling

sudo opcontrol --shutdown

stop sampling

You may need to adjust the event names according to your processor.

Intel Core2 and earlier support two events at a time, i7 and later can do four.

Viewing Profile Data

opreport

view full-system profile summary

opreport -l image:PROGRAM

per-function profile for PROGRAM

opreport -l -c image:PROGRAM

... with call graph info (if gathered, see above)

opannotate --source image:PROGRAM

annotate source code with sample counts

opannotate --assembly image:PROGRAM

annotate assembly code with sample counts

Nvidia GPU Compute Profiler

export COMPUTE_PROFILE=1

enable profiler

export COMPUTE_PROFILE_CONFIG=SOMEFILE

set profile event selection file

where SOMEFILE contains event names, one per line.

compute-profiler-manual.txt