NCSA Home
Contact Us | Intranet | Search

Timing and Profiling on Cobalt, an Overview

  1. Timing Codes
  2. Profiling

1. Timing Codes


In case you are wondering about the definition of the types of time, here they are:
  • user -- the amount of CPU time used by the user's program
  • sys (or system) -- the amount of CPU time used by the system in support of the user's program
  • cpu -- the total CPU time, i.e., user + sys
  • wall -- the wall clock time, i.e., elapsed real time

Typically the cpu time and the wall clock time are the same, unless there are other user processes running or there is significant system usage as in excessive disk usage from i/o operations or swapping/paging.


1.1 time (/usr/bin/time)

The quickest way to get timing of a code is run the code within the command: /usr/bin/time. The command will return user time, system time and the total wall time. See the man page on time to see more information on the command, especially on formatting the output. Note that the csh and tcsh shells have a built-in command also called time.

% /usr/bin/time a.out

Use the -p option to use portability format.

1.2 gprof

gprof is currently not functioning correctly with MPI (MPT) codes.

A quick way to get more detailed information on functions and routines is to use the profile tool gprof. The first step is to compile to source code with the compiler flags for profiling. For the Intel compiler the flags are -p -g and for the GNU compiler the flag is -pg. For the Intel compiler the -g flag does not change the optimization indicated by the presence of a -O flag. After compiling the code, the second step is to execute the code which will then generate a gmon.out file. To analyze the gmon.out file, use gprof. The results of the analyses will be dumped to stdout. The flat profile will contain a useful breakdown of time spent in functions and subroutines. The call graph profile contains inclusive and exclusive time spent in subroutines and functions. See the man pages on the Intel and GNU compilers for information about the compiler flags for profiling and see the man page on gprof for its options.

% ifort -O -p -g myprog.f # or gcc -O -pg myprog.c
% ./a.out
% gprof --flat-profile a.out gmon.out

See the section on Profiling below for more information about using gprof.

For even easier timing and profiling without re-compiling, consider using psrun from PerfSuite.

2. Profiling

2.1 gprof

gprof is currently not functioning correctly with MPI (MPT) codes.

A quick way to get more detailed information on functions and routines is to use the profile tool gprof. The first step is to compile to source code with the compiler flags for profiling. For the Intel compiler the flags are -p -g and for the GNU compiler the flag is -pg. For the Intel compiler the '-g' flag does not change the optimization indicated by the presence (if any) of the '-O' flag. After compiling the code, the second step is to execute the code which will then generate a gmon.out file. To analyze the gmon.out file, use gprof. The results of the analyses will be dumped to stdout.

% ifort -O -p -g myprog.f # or gcc -O -pg myprog.c
% ./a.out
% gprof a.out gmon.out

The 'flat' profile will contain a useful breakdown of time spent in functions and subroutines. The 'call graph'  profile contains inclusive and exclusive time spent in subroutines and functions. See the man pages on the Intel and GNU compilers for information about the compiler flags for profiling and see the man page on gprof for its options.

An undocumented GMON environment variable is GMON_OUT_PREFIX. When profiling a threaded or MPI code, each process will generate a gmon file called $GMON_OUT_PREFIX.pid. Each gmon file can then be analyzed seperately or the aggregate sum can be produced by gmon and examined as a whole:

% gprof -s $GMON_OUT_PREFIX.*
% gprof a.out gmon.sum

2.2 PerfSuite

The PerfSuite performance suite provides a profiling tool called psrun which provides enhanced functionality of the timing and profiling tools mentioned above. See the Linux Journal article Measuring and Improving Application Performance with PerfSuite for an introduction to PerfSuite.

The simpliest way to use psrun is with an existing executable:

Serial codes
% soft add +perfsuite
% resoft
% psrun ./a.out
% psprocess a.out.PID.xml # PID is the process ID when a.out was run

OpenMP and MPI codes
See the discussion at the Perfsuite page here.

See the documentation for psprocess for information on analyzing the XML files generated by psrun.


Performance Engineering and Computational Methods Group (PECM)
High-End Computing Division