基准测试

命名

benchmark - benchmark the execution of a gm command

内容

  • 概要
  • 描述
  • 示例
  • 选项

概要

gm benchmark [ options ... ] command

描述

benchmark executes an arbitrary gm utility command (e.g. convert) for one or more loops, and/or a specified execution time, and reports many execution metrics. For builds using OpenMP, a mode is provided to execute the benchmark with an increasing number of threads and provide a report of speedup and multi-thread execution efficiency. If benchmark is used to execute a command without any additional benchmark options, then the command is run once.

示例

To obtain benchmark information for a single execution of a command:

    % gm benchmark convert input.ppm -gaussian 0x1 output.ppm
    Results: 32 threads 1 iter 6.20s user 0.41s total 2.439 iter/s 0.161 iter/cpu
To obtain benchmark information from 100 iterations of the command:

    % gm benchmark -iterations 100 convert input.ppm -gaussian 0x1 output.ppm
    Results: 32 threads 100 iter 625.40s user 31.74s total 3.151 iter/s 0.160 iter/cpu

To obtain benchmark information by iterating the command until a specified amount of time (in seconds) has been consumed:

    % gm benchmark -duration 30 convert input.ppm -gaussian 0x1 output.ppm
    Results: 32 threads 91 iter 587.33s user 30.30s total 3.003 iter/s 0.155 iter/cpu

To obtain a full performance report with an increasing number of threads (1-32 threads, stepping the number of threads by four each time):

  % gm benchmark -duration 3 -stepthreads 4 convert input.ppm -gaussian 0x2 output.ppm
  Results: 1 threads 1 iter 8.84s user 8.84s total 0.113 iter/s 0.113 iter/cpu 1.00 speedup 1.000 karp-flatt
  Results: 4 threads 2 iter 18.37s user 4.89s total 0.409 iter/s 0.109 iter/cpu 3.62 speedup 0.035 karp-flatt
  Results: 8 threads 3 iter 29.81s user 4.09s total 0.733 iter/s 0.101 iter/cpu 6.48 speedup 0.033 karp-flatt
  Results: 12 threads 3 iter 30.81s user 3.14s total 0.955 iter/s 0.097 iter/cpu 8.45 speedup 0.038 karp-flatt
  Results: 16 threads 3 iter 35.02s user 3.01s total 0.997 iter/s 0.086 iter/cpu 8.81 speedup 0.054 karp-flatt
  Results: 20 threads 4 iter 52.92s user 3.53s total 1.133 iter/s 0.076 iter/cpu 10.02 speedup 0.052 karp-flatt
  Results: 24 threads 4 iter 60.66s user 3.39s total 1.180 iter/s 0.066 iter/cpu 10.43 speedup 0.057 karp-flatt
  Results: 28 threads 4 iter 73.10s user 3.35s total 1.194 iter/s 0.055 iter/cpu 10.56 speedup 0.061 karp-flatt
  Results: 32 threads 4 iter 82.10s user 3.09s total 1.294 iter/s 0.049 iter/cpu 11.44 speedup 0.058 karp-flatt

Here is the interpretation of the output:

  • threads - number of threads used.
  • iter - number of command iterations executed.
  • user - total user time consumed.
  • total - total elapsed time consumed.
  • iter/s - number of command iterations per second.
  • iter/cpu - amount of CPU time consumed per iteration.
  • speedup - speedup compared with one thread.
  • karp-flatt - Karp-Flatt measure of speedup efficiency.

Please note that the reported "speedup" is based on the execution time of just one thread. A preliminary warm-up pass is used before timing the first loop in order to ensure that the CPU is brought out of power-saving modes and that system caches are warmed up. Most modern CPUs provide a "turbo" mode where the CPU clock speed is increased (e.g. by a factor of two) when only one or two cores are active. If the CPU grows excessively hot (due to insufficient cooling), then it may dial back its clock rates as a form of thermal management. These factors result in an under-reporting of speedup compared to if "turbo" mode was disabled and the CPU does not need to worry about thermal management. The powertop utility available under Linux and Solaris provides a way to observe CPU core clock rates while a benchmark is running.

选项

Options are processed from left to right and must appear before any argument.

-duration duration

duration to run benchmark

Specify the number of seconds to run the benchmark. The command is executed repeatedly until the specified amount of time has elapsed.

-help

Prints benchmark command help.

-iterations loops

number of command iterations

Specify the number of iterations to run the benchmark. The command is executed repeatedly until the specified number of iterations has been reached.

-rawcsv

Print results in CSV format

Print results in a comma-separated value (CSV) format which is easy to parse for plotting or importing into a spreadsheet or database. The values reported are threads, iterations, user_time, and elapsed_time.

-stepthreads step

execute a per-thread benchmark ramp

Execute a per-thread benchmark ramp, incrementing the number of threads at each step by the specified value. The maximum number of threads is taken from the standard OMP_NUM_THREADS environment variable.