STRESS-NG(1) | General Commands Manual | STRESS-NG(1) |
stress-ng - stress "next generation", a tool to load and stress a computer system
stress-ng [OPTION [ARG]] ...
stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces. stress-ng also has a wide range of CPU specific stress tests that exercise floating point, integer, bit manipulation and control flow.
stress-ng was originally intended to make a machine work hard and trip hardware issues such as thermal overruns as well as operating system bugs that only occur when a system is being thrashed hard. Use stress-ng with caution as some of the tests can make a system run hot on poorly designed hardware and also can cause excessive system thrashing which may be difficult to stop.
stress-ng can also measure test throughput rates; this can be useful to observe performance changes across different operating system releases or types of hardware. However, it has never been intended to be used as a precise benchmark test suite, so do NOT use it in this manner.
Running stress-ng with root privileges will adjust out of memory settings on Linux systems to make the stressors unkillable in low memory situations, so use this judiciously. With the appropriate privilege, stress-ng can allow the ionice class and ionice levels to be adjusted, again, this should be used with care.
One can specify the number of processes to invoke per type of stress test; specifying a zero value will select the number of processors available as defined by sysconf(_SC_NPROCESSORS_CONF), if that can't be determined then the number of online CPUs is used. If the value is less than zero then the number of online CPUs is used.
General stress-ng control options:
Specifying a name followed by a question mark (for example --class vm?) will print out all the stressors in that specific class.
Column Heading | Explanation |
Inflight | number of I/O requests that have been issued to the device driver but have not yet completed |
Rd K/s | read rate in 1024 bytes per second |
Wr K/s | write rate in 1024 bytes per second |
Dscd K/s | discard rate in 1024 bytes per second |
Rd/s | reads per second |
Wr/s | writes per second |
Dscd/s | discards per second |
run sequential # run stressors sequentially verbose # verbose output metrics-brief # show metrics at end of run timeout 60s # stop each stressor after 60 seconds # # vm stressor options: # vm 2 # 2 vm stressors vm-bytes 128M # 128MB available memory vm-keep # keep vm mapping vm-populate # populate memory # # memcpy stressor options: # memcpy 5 # 5 memcpy stressors
The job file introduces the run command that specifies how to run the stressors:
run sequential - run stressors sequentially
run parallel - run stressors together in parallel
Note that 'run parallel' is the default.
The following columns of information are output:
Column Heading | Explanation |
bogo ops | number of iterations of the stressor during the run. This is metric of how much overall "work" has been achieved in bogo operations. Do not use this as a reliable measure of throughput for benchmarking. |
real time (secs) | average wall clock duration (in seconds) of the stressor. This is the total wall clock time of all the instances of that particular stressor divided by the number of these stressors being run. |
usr time (secs) | total user time (in seconds) consumed running all the instances of the stressor. |
sys time (secs) | total system time (in seconds) consumed running all the instances of the stressor. |
bogo ops/s (real time) | total bogo operations per second based on wall clock run time. The wall clock time reflects the apparent run time. The more processors one has on a system the more the work load can be distributed onto these and hence the wall clock time will reduce and the bogo ops rate will increase. This is essentially the "apparent" bogo ops rate of the system. |
bogo ops/s (usr+sys time) | total bogo operations per second based on cumulative user and system time. This is the real bogo ops rate of the system taking into consideration the actual time execution time of the stressor across all the processors. Generally this will decrease as one adds more concurrent stressors due to contention on cache, memory, execution units, buses and I/O devices. |
CPU used per instance (%) | total percentage of CPU used divided by number of stressor instances. 100% is 1 full CPU. Some stressors run multiple threads so it is possible to have a figure greater than 100%. |
RSS Max (KB) | resident set size (RSS), the portion of memory (measured in Kilobytes) occupied by a process in main memory. |
stress-ng --seq 5 --with cpu,hash,nop,vm --timeout 1m
Stressor specific options:
Method | Description |
inc | increment ioctl command by 1 |
random | use a random ioctl command |
random-inc | increment ioctl command by a random value |
random-stride | increment ioctl command number by 1 and decrement command type by 3 |
Method | Description |
all | iterate over all the below cpu stress methods. |
adjacent | increment a specific byte in a cacheline and read the adjacent byte, check for corruption every 7 increments. |
atomicinc | atomically increment a specific byte in a cacheline and check for corruption every 7 increments. |
bits | write and read back shifted bit patterns into specific byte in a cacheline and check for corruption. |
copy | copy an adjacent byte to a specific byte in a cacheline. |
inc | increment and read back a specific byte in a cacheline and check for corruption every 7 increments. |
mix | perform a mix of increment, left and right rotates a specific byte in a cacheline and check for corruption. |
rdfwd64 | increment a specific byte in a cacheline and then read in forward direction an entire cacheline using 64 bit reads. |
rdints | increment a specific byte in a cacheline and then read data at that byte location in naturally aligned locations integer values of size 8, 16, 32, 64 and 128 bits. |
rdrev64 | increment a specific byte in a cacheline and then read in reverse direction an entire cacheline using 64 bit reads. |
rdwr | read and write the same 8 bit value into a specific byte in a cacheline and check for corruption. |
Note: This option only applies to the --cpu stressor option and not to all of the cpu class of stressors.
Note: This option only applies to the --cpu stressor option and not to all of the cpu class of stressors.
Method | Description |
all | iterate over all the below cpu stress methods |
ackermann | Ackermann function: compute A(3, 7), where: A(m, n) = n + 1 if m = 0; A(m - 1, 1) if m > 0 and n = 0; A(m - 1, A(m, n - 1)) if m > 0 and n > 0 |
apery | calculate Apery's constant ζ(3); the sum of 1/(n ↑ 3) to a precision of 1.0x10↑14 |
bitops | various bit operations from bithack, namely: reverse bits, parity check, bit count, round to nearest power of 2 |
callfunc | recursively call 8 argument C function to a depth of 1024 calls and unwind |
cfloat | 1000 iterations of a mix of floating point complex operations |
cdouble | 1000 iterations of a mix of double floating point complex operations |
clongdouble | 1000 iterations of a mix of long double floating point complex operations |
collatz | compute the 1348 steps in the collatz sequence starting from number 989345275647. Where f(n) = n / 2 (for even n) and f(n) = 3n + 1 (for odd n). |
correlate | perform a 8192 × 512 correlation of random doubles |
crc16 | compute 1024 rounds of CCITT CRC16 on random data |
decimal32 | 1000 iterations of a mix of 32 bit decimal floating point operations (GCC only) |
decimal64 | 1000 iterations of a mix of 64 bit decimal floating point operations (GCC only) |
decimal128 | 1000 iterations of a mix of 128 bit decimal floating point operations (GCC only) |
dither | Floyd-Steinberg dithering of a 1024 × 768 random image from 8 bits down to 1 bit of depth |
div8 | 50,000 8 bit unsigned integer divisions |
div16 | 50,000 16 bit unsigned integer divisions |
div32 | 50,000 32 bit unsigned integer divisions |
div64 | 50,000 64 bit unsigned integer divisions |
div128 | 50,000 128 bit unsigned integer divisions |
double | 1000 iterations of a mix of double precision floating point operations |
euler | compute e using n = (1 + (1 ÷ n)) ↑ n |
explog | iterate on n = exp(log(n) ÷ 1.00002) |
factorial | find factorials from 1..150 using Stirling's and Ramanujan's approximations |
fibonacci | compute Fibonacci sequence of 0, 1, 1, 2, 5, 8... |
fft | 4096 sample Fast Fourier Transform |
fletcher16 | 1024 rounds of a naïve implementation of a 16 bit Fletcher's checksum |
float | 1000 iterations of a mix of floating point operations |
float16 | 1000 iterations of a mix of 16 bit floating point operations |
float32 | 1000 iterations of a mix of 32 bit floating point operations |
float64 | 1000 iterations of a mix of 64 bit floating point operations |
float80 | 1000 iterations of a mix of 80 bit floating point operations |
float128 | 1000 iterations of a mix of 128 bit floating point operations |
floatconversion | perform 65536 iterations of floating point conversions between float, double and long double floating point variables. |
gamma | calculate the Euler-Mascheroni constant γ using the limiting difference between the harmonic series (1 + 1/2 + 1/3 + 1/4 + 1/5 ... + 1/n) and the natural logarithm ln(n), for n = 80000. |
gcd | compute GCD of integers |
gray | calculate binary to gray code and gray code back to binary for integers from 0 to 65535 |
hamming | compute Hamming H(8,4) codes on 262144 lots of 4 bit data. This turns 4 bit data into 8 bit Hamming code containing 4 parity bits. For data bits d1..d4, parity bits are computed as: p1 = d2 + d3 + d4 p2 = d1 + d3 + d4 p3 = d1 + d2 + d4 p4 = d1 + d2 + d3 |
hanoi | solve a 21 disc Towers of Hanoi stack using the recursive solution |
hyperbolic | compute sinh(θ) × cosh(θ) + sinh(2θ) + cosh(3θ) for float, double and long double hyperbolic sine and cosine functions where θ = 0 to 2π in 1500 steps |
idct | 8 × 8 IDCT (Inverse Discrete Cosine Transform). |
int8 | 1000 iterations of a mix of 8 bit integer operations. |
int16 | 1000 iterations of a mix of 16 bit integer operations. |
int32 | 1000 iterations of a mix of 32 bit integer operations. |
int64 | 1000 iterations of a mix of 64 bit integer operations. |
int128 | 1000 iterations of a mix of 128 bit integer operations (GCC only). |
int32float | 1000 iterations of a mix of 32 bit integer and floating point operations. |
int32double | 1000 iterations of a mix of 32 bit integer and double precision floating point operations. |
int32longdouble | 1000 iterations of a mix of 32 bit integer and long double precision floating point operations. |
int64float | 1000 iterations of a mix of 64 bit integer and floating point operations. |
int64double | 1000 iterations of a mix of 64 bit integer and double precision floating point operations. |
int64longdouble | 1000 iterations of a mix of 64 bit integer and long double precision floating point operations. |
int128float | 1000 iterations of a mix of 128 bit integer and floating point operations (GCC only). |
int128double | 1000 iterations of a mix of 128 bit integer and double precision floating point operations (GCC only). |
int128longdouble | 1000 iterations of a mix of 128 bit integer and long double precision floating point operations (GCC only). |
int128decimal32 | 1000 iterations of a mix of 128 bit integer and 32 bit decimal floating point operations (GCC only). |
int128decimal64 | 1000 iterations of a mix of 128 bit integer and 64 bit decimal floating point operations (GCC only). |
int128decimal128 | 1000 iterations of a mix of 128 bit integer and 128 bit decimal floating point operations (GCC only). |
intconversion | perform 65536 iterations of integer conversions between int16, int32 and int64 variables. |
ipv4checksum | compute 1024 rounds of the 16 bit ones' complement IPv4 checksum. |
jmp | Simple unoptimised compare >, <, == and jmp branching. |
lfsr32 | 16384 iterations of a 32 bit Galois linear feedback shift register using the polynomial x↑32 + x↑31 + x↑29 + x + 1. This generates a ring of 2↑32 - 1 unique values (all 32 bit values except for 0). |
ln2 | compute ln(2) based on series: 1 - 1/2 + 1/3 - 1/4 + 1/5 - 1/6 ... |
logmap | 16384 iterations computing chaotic double precision values using the logistic map Χn+1 = r × Χn × (1 - Χn) where r > ≈ 3.56994567 |
longdouble | 1000 iterations of a mix of long double precision floating point operations. |
loop | simple empty loop. |
matrixprod | matrix product of two 128 × 128 matrices of double floats. Testing on 64 bit x86 hardware shows that this is provides a good mix of memory, cache and floating point operations and is probably the best CPU method to use to make a CPU run hot. |
nsqrt | compute sqrt() of long doubles using Newton-Raphson. |
omega | compute the omega constant defined by Ωe↑Ω = 1 using efficient iteration of Ωn+1 = (1 + Ωn) / (1 + e↑Ωn). |
parity | compute parity using various methods from the Standford Bit Twiddling Hacks. Methods employed are: the naïve way, the naïve way with the Brian Kernigan bit counting optimisation, the multiply way, the parallel way, the lookup table ways (2 variations) and using the __builtin_parity function. |
phi | compute the Golden Ratio ϕ using series. |
pi | compute π using the Srinivasa Ramanujan fast convergence algorithm. |
prime | find the first 10000 prime numbers using a slightly optimised brute force naïve trial division search. |
psi | compute ψ (the reciprocal Fibonacci constant) using the sum of the reciprocals of the Fibonacci numbers. |
queens | compute all the solutions of the classic 8 queens problem for board sizes 1..11. |
rand | 16384 iterations of rand(), where rand is the MWC pseudo random number generator. The MWC random function concatenates two 16 bit multiply-with-carry generators: x(n) = 36969 × x(n - 1) + carry, y(n) = 18000 × y(n - 1) + carry mod 2 ↑ 16 and has period of around 2 ↑ 60. |
rand48 | 16384 iterations of drand48(3) and lrand48(3). |
rgb | convert RGB to YUV and back to RGB (CCIR 601). |
sieve | find the first 10000 prime numbers using the sieve of Eratosthenes. |
stats | calculate minimum, maximum, arithmetic mean, geometric mean, harmoninc mean and standard deviation on 250 randomly generated positive double precision values. |
sqrt | compute sqrt(rand()), where rand is the MWC pseudo random number generator. |
trig | compute sin(θ) × cos(θ) + sin(2θ) + cos(3θ) for float, double and long double sine and cosine functions where θ = 0 to 2π in 1500 steps. |
union | perform integer arithmetic on a mix of bit fields in a C union. This exercises how well the compiler and CPU can perform integer bit field loads and stores. |
zeta | compute the Riemann Zeta function ζ(s) for s = 2.0..10.0 |
Note that some of these methods try to exercise the CPU with computations found in some real world use cases. However, the code has not been optimised on a per-architecture basis, so may be a sub-optimal compared to hand-optimised code used in some applications. They do try to represent the typical instruction mixes found in these use cases.
Method | Description |
clock_ns | sleep for the specified time using the clock_nanosleep(2) high resolution nanosleep and the CLOCK_REALTIME real time clock. |
itimer | wakeup a paused process with a CLOCK_REALTIME itimer signal. |
poll | delay for the specified time using a poll delay loop that checks for time changes using clock_gettime(2) on the CLOCK_REALTIME clock. |
posix_ns | sleep for the specified time using the POSIX nanosleep(2) high resolution nanosleep. |
pselect | sleep for the specified time using pselect(2) with null file descriptors. |
usleep | sleep to the nearest microsecond using usleep(2). |
Method | Description |
all | iterate over all the Eigen 2D matrix operations |
add-longdouble | addition of two matrices of long double floating point values T{ add-doublee T{ addition of two matrices of double floating point values |
add-float | addition of two matrices of floating point values |
determinant-longdouble | determinant of matrix of long double floating point values |
determinant-double | determinant of matrix of double floating point values |
determinant-float | determinant of matrix of floating point values |
inverse-longdouble | inverse of matrix of long double floating point values |
inverse-double | inverse of matrix of double floating point values |
inverse-float | inverse of matrix of floating point values |
multiply-longdouble | mutiplication of two matrices of long double floating point values |
multiply-doublee | mutiplication of two matrices of double floating point values |
multiply-float | mutiplication of two matrices of floating point values |
transpose-longdouble | transpose of matrix of long double floating point values |
transpose-double | transpose of matrix of double floating point values |
transpose-float | transpose of matrix of floating point values |
Option | Description |
probe | default option, probe the file system for valid allowed characters in a file name and use these |
posix | use characters as specified by The Open Group Base Specifications Issue 7, POSIX.1-2008, 3.278 Portable Filename Character Set |
ext | use characters allowed by the ext2, ext3, ext4 file systems, namely any 8 bit character apart from NUL and / |
a = a × b + c |
a = b × a + c |
a = b × c + a |
Method | Description |
all | iterate over all the following floating point methods: |
float128add | 128 bit floating point add |
float80add | 80 bit floating point add |
float64add | 64 bit floating point add |
float32add | 32 bit binary32 floating point add |
floatadd | floating point add |
doubleadd | double precision floating point add |
ldoubleadd | long double precision floating point add |
float128mul | 128 bit floating point multiply |
float80mul | 80 bit floating point multiply |
float64mul | 64 bit floating point multiply |
float32mul | 32 bit binary32 floating point multiply |
floatmul | floating point multiply |
doublemul | double precision floating point multiply |
ldoublemul | long double precision floating point multiply |
float128div | 128 bit floating point divide |
float80div | 80 bit floating point divide |
float64div | 64 bit floating point divide |
float32div | 32 bit binary32 floating point divide |
floatdiv | floating point divide |
doublediv | double precision floating point divide |
ldoublediv | long double precision floating point divide |
Note that some of these floating point methods may not be available on some systems.
Method | Description |
all | cycle through all the hashing methods |
adler32 | Mark Adler checksum, a modification of the Fletcher checksum |
coffin | xor and 5 bit rotate left hash |
coffin32 | xor and 5 bit rotate left hash with 32 bit fetch optimization |
crc32c | compute CRC32C (Castagnoli CRC32) integer hash |
djb2a | Dan Bernstein hash using the xor variant |
fnv1a | FNV-1a Fowler-Noll-Vo hash using the xor then multiply variant |
jenkin | Jenkin's integer hash |
kandr | Kernighan and Richie's multiply by 31 and add hash from "The C Programming Language", 2nd Edition |
knuth | Donald E. Knuth's hash from "The Art Of Computer Programming", Volume 3, chapter 6.4 |
loselose | Kernighan and Richie's simple hash from "The C Programming Language", 1st Edition |
mid5 | xor shift hash of the middle 5 characters of the string. Designed by Colin Ian King |
muladd32 | simple multiply and add hash using 32 bit math and xor folding of overflow |
muladd64 | simple multiply and add hash using 64 bit math and xor folding of overflow |
mulxror32 | 32 bit multiply, xor and rotate right. Mangles 32 bits where possible. Designed by Colin Ian King |
mulxror64 | 64 bit multiply, xor and rotate right. 64 Bit version of mulxror32 |
murmur3_32 | murmur3_32 hash, Austin Appleby's Murmur3 hash, 32 bit variant |
nhash | exim's nhash. |
pjw | a non-cryptographic hash function created by Peter J. Weinberger of AT&T Bell Labs, used in UNIX ELF object files |
sdbm | sdbm hash as used in the SDBM database and GNU awk |
sedgwick | simple hash from Robert Sedgwick's C programming book |
sobel | Justin Sobel's bitwise shift hash |
x17 | multiply by 17 and add. The multiplication can be optimized down to a fast right shift by 4 and add on some architectures |
xor | simple rotate shift and xor of values |
xorror32 | 32 bit exclusive-or with right rotate hash, a fast string hash, designed by Colin Ian King |
xorror64 | 64 bit version of xorror32 |
xxhash | the "Extremely fast" hash in non-streaming mode |
Option | Description |
direct | try to minimize cache effects of the I/O. File I/O writes are performed directly from user space buffers and synchronous transfer is also attempted. To guarantee synchronous I/O, also use the sync option. |
dsync | ensure output has been transferred to underlying hardware and file metadata has been updated (using the O_DSYNC open flag). This is equivalent to each write(2) being followed by a call to fdatasync(2). See also the fdatasync option. |
fadv-dontneed | advise kernel to expect the data will not be accessed in the near future. |
fadv-noreuse | advise kernel to expect the data to be accessed only once. |
fadv-normal | advise kernel there are no explicit access pattern for the data. This is the default advice assumption. |
fadv-rnd | advise kernel to expect random access patterns for the data. |
fadv-seq | advise kernel to expect sequential access patterns for the data. |
fadv-willneed | advise kernel to expect the data to be accessed in the near future. |
fsync | flush all modified in-core data after each write to the output device using an explicit fsync(2) call. |
fdatasync | similar to fsync, but do not flush the modified metadata unless metadata is required for later data reads to be handled correctly. This uses an explicit fdatasync(2) call. |
iovec | use readv/writev multiple buffer I/Os rather than read/write. Instead of 1 read/write operation, the buffer is broken into an iovec of 16 buffers. |
noatime | do not update the file last access timestamp, this can reduce metadata writes. |
sync | ensure output has been transferred to underlying hardware (using the O_SYNC open flag). This is equivalent to a each write(2) being followed by a call to fsync(2). See also the fsync option. |
rd-rnd | read data randomly. By default, written data is not read back, however, this option will force it to be read back randomly. |
rd-seq | read data sequentially. By default, written data is not read back, however, this option will force it to be read back sequentially. |
syncfs | write all buffered modifications of file metadata and data on the filesystem that contains the hdd worker files. |
utimes | force update of file timestamp which may increase metadata writes. |
wr-rnd | write data randomly. The wr-seq option cannot be used at the same time. |
wr-seq | write data sequentially. This is the default if no write modes are specified. |
Note that some of these options are mutually exclusive, for example, there can be only one method of writing or reading. Also, fadvise flags may be mutually exclusive, for example fadv-willneed cannot be used with fadv-dontneed.
Type | Description |
brown | brown noise, red and green values vary by a 3 bit value, blue values vary by a 2 bit value. |
flat | a single random colour for the entire image. |
gradient | linear gradient of the red, green and blue components across the width and height of the image. |
noise | random white noise for red, green, blue values. |
plasma | plasma field with smooth colour transitions and hard boundary edges. |
xstripes | a random colour for each horizontal line. |
By default, this will exercise all the matrix stress methods one by one. One can specify a specific matrix stress method with the --matrix-method option.
Method | Description |
all | iterate over all the below matrix stress methods |
add | add two N × N matrices |
copy | copy one N × N matrix to another |
div | divide an N × N matrix by a scalar |
frobenius | Frobenius product of two N × N matrices |
hadamard | Hadamard product of two N × N matrices |
identity | create an N × N identity matrix |
mean | arithmetic mean of two N × N matrices |
mult | multiply an N × N matrix by a scalar |
negate | negate an N × N matrix |
prod | product of two N × N matrices |
sub | subtract one N × N matrix from another N × N matrix |
square | multiply an N × N matrix by itself |
trans | transpose an N × N matrix |
zero | zero an N × N matrix |
By default, this will exercise all the 3D matrix stress methods one by one. One can specify a specific 3D matrix stress method with the --matrix-3d-method option.
Method | Description |
all | iterate over all the below matrix stress methods |
add | add two N × N × N matrices |
copy | copy one N × N × N matrix to another |
div | divide an N × N × N matrix by a scalar |
frobenius | Frobenius product of two N × N × N matrices |
hadamard | Hadamard product of two N × N × N matrices |
identity | create an N × N × N identity matrix |
mean | arithmetic mean of two N × N × N matrices |
mult | multiply an N × N × N matrix by a scalar |
negate | negate an N × N × N matrix |
sub | subtract one N × N × N matrix from another N × N × N matrix |
trans | transpose an N × N × N matrix |
zero | zero an N × N × N matrix |
Method | Description |
all | use libc, builtin and naïve methods |
libc | use libc memcpy and memmove functions, this is the default |
builtin | use the compiler built in optimized memcpy and memmove functions |
naive | use naïve byte by byte copying and memory moving build with default compiler optimization flags |
naive_o0 | use unoptimized naïve byte by byte copying and memory moving |
naive_o1 | use unoptimized naïve byte by byte copying and memory moving with -O1 optimization |
naive_o2 | use optimized naïve byte by byte copying and memory moving build with -O2 optimization and where possible use CPU specific optimizations |
naive_o3 | use optimized naïve byte by byte copying and memory moving build with -O3 optimization and where possible use CPU specific optimizations |
Method | Description |
all | iterate over all the below memthrash methods |
chunk1 | memset 1 byte chunks of random data into random locations |
chunk8 | memset 8 byte chunks of random data into random locations |
chunk64 | memset 64 byte chunks of random data into random locations |
chunk256 | memset 256 byte chunks of random data into random locations |
chunkpage | memset page size chunks of random data into random locations |
copy128 | copy 128 byte chunks from chunk N + 1 to chunk N with streaming reads and writes with 128 bit memory accesses where possible. |
flip | flip (invert) all bits in random locations |
flush | flush cache line in random locations |
lock | lock randomly choosing locations (Intel x86 and ARM CPUs only) |
matrix | treat memory as a 2 × 2 matrix and swap random elements |
memmove | copy all the data in buffer to the next memory location |
memset | memset the memory with random data |
memset64 | memset the memory with a random 64 bit value in 64 byte chunks using non-temporal stores if possible or normal stores as a fallback |
memsetstosd | memset the memory using x86 32 bit rep stosd instruction (x86 only) |
mfence | stores with write serialization |
numa | memory bind pages across numa nodes |
prefetch | prefetch data at random memory locations |
random | randomly run any of the memthrash methods except for 'random' and 'all' |
reverse | swap 8 bit values from start to end and work towards the middle |
spinread | spin loop read the same random location 2↑19 times |
spinwrite | spin loop write the same random location 2↑19 times |
swap | step through memory swapping bytes in steps of 65 and 129 byte strides |
swap64 | work through memory swapping adjacent 64 byte chunks |
swapfwdrev | swap 64 bit values from start to end and work towards the middle and then from end to start and work towards the middle. |
tlb | work through memory in sub-optimial strides of prime multiples of the cache line size with reads and then writes to cause Translation Lookaside Buffer (TLB) misses. |
Method | Description |
all | iterate over all the following misaligned methods |
int16rd | 8 × 16 bit integer reads |
int16wr | 8 × 16 bit integer writes |
int16inc | 8 × 16 bit integer increments |
int16atomic | 8 × 16 bit atomic integer increments |
int32rd | 4 × 32 bit integer reads |
int32wr | 4 × 32 bit integer writes |
int32wtnt | 4 × 32 bit non-temporal stores (x86 only) |
int32inc | 4 × 32 bit integer increments |
int32atomic | 4 × 32 bit atomic integer increments |
int64rd | 2 × 64 bit integer reads |
int64wr | 2 × 64 bit integer writes |
int64wtnt | 4 × 64 bit non-temporal stores (x86 only) |
int64inc | 2 × 64 bit integer increments |
int64atomic | 2 × 64 bit atomic integer increments |
int128rd | 1 × 128 bit integer reads |
int128wr | 1 × 128 bit integer writes |
int128inc | 1 × 128 bit integer increments |
int128atomic | 1 × 128 bit atomic integer increments |
Note that some of these options (128 bit integer and/or atomic operations) may not be available on some systems.
Note that since stress-ng 0.17.05 the --mmap-madvise, --mmap-mergeable, --mmap-mprotect, --mmap-slow-munmap and --mmap-write-check options should be used to enable the pre-0.17.05 mmap stressor behaviour.
Method | Description |
all | use all monte carlo computation methods |
e | compute Euler's constant e |
exp | integrate exp(x ↑ 2) for x = 0..1 |
pi | compute π from the area of a circle |
sin | integrate sin(x) for x = 0..π |
sqrt | integrate sqrt(1 + x ↑ 4) for x = 0..1 |
Method | Description |
all | use all the random number generators |
arc4 | use the libc cryptographically-secure pseudorandom arc4random(3) number generator. |
drand48 | use the libc linear congruential algorithm drand48(3) using 48-bit integer arithmetic. |
getrandom | use the getrandom(2) system call for random values. |
lcg | use a 32 bit Paker-Miller Linear Congruential Generator, with a division optimization. |
pcg32 | use a 32 bit O'Neill Permuted Congruential Generator. |
mwc64 | use the 64 bit stress-ng Multiply With Carry random number generator. |
random | use the libc random(3) Non-linear Additive Feedback random number generator. |
xorshift | use a 32 bit Marsaglia shift-register random number generator. |
Method | Description |
apery | calculate Apery's constant ζ(3); the sum of 1/(n ↑ 3). |
cosine | compute cos(θ) for θ = 0 to 2π in 100 steps. |
euler | compute e using n = (1 + (1 ÷ n)) ↑ n. |
exp | compute 1000 exponentials. |
log | computer 1000 natural logarithms. |
omega | compute the omega constant defined by Ωe↑Ω = 1 using efficient iteration of Ωn+1 = (1 + Ωn) / (1 + e↑Ωn). |
phi | compute the Golden Ratio ϕ using series. |
sine | compute sin(θ) for θ = 0 to 2π in 100 steps. |
nsqrt | compute square root using Newton-Raphson. |
Method | Description |
all | use cstate and random nanosecond durations. |
cstate | use cstate nanosecond durations. It is recommended to also use --nanosleep-threads 1 to exercise less conconcurrent nanosleeps to allow CPUs to drop into deep C states. |
random | use random nanosecond durations between 1 and 2^18 nanoseconds. |
ns | use 1ns (nanosecond) nanosleeps |
us | use 1us (microsecond) nanosleeps |
ms | use 1ms (millisecond) nanosleeps |
Method | Description |
inc | use incrementing 32 bit opcode patterns from 0x00000000 to 0xfffffff inclusive. |
mixed | use a mix of incrementing 32 bit opcode patterns and random 32 bit opcode patterns that are also inverted, encoded with gray encoding and bit reversed. |
random | generate opcodes using random bytes from a mwc random generator. |
text | copies random chunks of code from the stress-ng text segment and randomly flips single bits in a random choice of 1/8th of the code. |
int stress_example(void) { int i; for (i = 0; i < 10000; i++) { __volatile__ __asm__("nop"); } return 0; /* Success */ }
and compile the source into a shared library as, for example:
gcc -fpic -shared -o example.so example.c
and run as using:
stress-ng --plugin 1 --plugin-so ./example.so
Method | Description |
builtin | Use the __builtin_prefetch(3) function for prefetching. This is the default. |
builtinl0 | Use the __builtin_prefetch(3) function for prefetching, with a locality 0 hint. |
builtinl3 | Use the __builtin_prefetch(3) function for prefetching, with a locality 3 hint. |
dcbt | Use the ppc64 dcbt instruction to fetch data into the L1 cache (ppc64 only). |
dcbtst | Use the ppc64 dcbtst instruction to fetch data into the L1 cache (ppc64 only). |
prefetcht0 | Use the x86 prefetcht0 instruction to prefetch data into all levels of the cache hierarchy (x86 only). |
prefetcht1 | Use the x86 prefetcht1 instruction (temporal data with respect to first level cache) to prefetch data into level 2 cache and higher (x86 only). |
prefetcht2 | Use the x86 prefetcht2 instruction (temporal data with respect to second level cache) to prefetch data into level 2 cache and higher (x86 only). |
prefetchnta | Use the x86 prefetchnta instruction (non-temporal data with respect to all cache levels) into a location close to the processor, minimizing cache pollution (x86 only). |
Type | Description |
inherit | The priority of the process owning the mutex lock is run with highest priority of any other process waiting on the lock to avoid priority inversion deadlock. |
none | The priority of the process owning the mutex lock is not affected by its mutex ownership. This may lead to the high priority process to become unrunnable on a single thread system. |
protect | The priority of the process owning the mutex lock is given the priority of the mutex (in this stress test case, the maximum priority) during the lock ownership. |
Method | Description |
all | iterate over all the race-sched methods as listed below: |
next | move a process to the next CPU, wrap around to zero when maximum CPU is reached. |
prev | move a process to the previous CPU, wrap around to the maximum CPU when the first CPU is reached. |
rand | move a process to any randomly chosen CPU. |
randinc | move a process to the current CPU + a randomly chosen value 1..4, modulo the number of CPUs. |
syncnext | move synchronously all the race-sched stressor processes to the next CPU every second; this loads just 1 CPU at a time in a round-robin method. |
syncprev | move synchronously all the race-sched stressor processes to the previous CPU every second; this loads just 1 CPU at a time in a round-robin method. |
Method | Description |
all | iterate over all the rawdev stress methods as listed below: |
sweep | repeatedly read across the raw device from the 0th block to the end block in steps of the number of blocks on the device / 128 and back to the start again. |
wiggle | repeatedly read across the raw device in 128 evenly steps with each step reading 1024 blocks backwards from each step. |
ends | repeatedly read the first and last 128 start and end blocks of the raw device alternating from start of the device to the end of the device. |
random | repeatedly read 256 random blocks |
burst | repeatedly read 256 sequential blocks starting from a random block on the raw device. |
Method | Description |
all | exercise with all the rotate stressor methods (see below): |
rol8 | 8 bit unsigned rotate left by 1 bit |
ror8 | 8 bit unsigned rotate right by 1 bit |
rol16 | 16 bit unsigned rotate left by 1 bit |
ror16 | 16 bit unsigned rotate right by 1 bit |
rol32 | 32 bit unsigned rotate left by 1 bit |
ror32 | 32 bit unsigned rotate right by 1 bit |
rol64 | 64 bit unsigned rotate left by 1 bit |
ror64 | 64 bit unsigned rotate right by 1 bit |
rol128 | 128 bit unsigned rotate left by 1 bit |
ror128 | 128 bit unsigned rotate right by 1 bit |
Method | Description |
all | exercise with all the sparsematrix stressor methods (see below): |
hash | use a hash table and allocate nodes on the heap for each unique value at a (x, y) matrix position. |
hashjudy | use a hash table for x coordinates and a Judy array for y coordinates for values at a (x, y) matrix position. |
judy | use a Judy array with a unique 1-to-1 mapping of (x, y) matrix position into the array. |
list | use a circular linked-list for sparse y positions each with circular linked-lists for sparse x positions for the (x, y) matrix coordinates. |
mmap | use a non-sparse mmap the entire 2-d matrix space. Only (x, y) matrix positions that are referenced will get physically mapped. Note that large sparse matrices cannot be mmap'd due to lack of virtual address limitations, and too many referenced pages can trigger the out of memory killer on Linux. |
qhash | use a hash table with pre-allocated nodes for each unique value. This is a quick hash table implementation, nodes are not allocated each time with calloc and are allocated from a pre-allocated pool leading to quicker hash table performance than the hash method. |
rb | use a red-black balanced tree using one tree node for each unique value at a (x, y) matrix position. |
splay | use a splay tree using one tree node for each unique value at a (x, y) matrix position. |
Operation | Description |
copy | c[i] = a[i] |
scale | b[i] = scalar * c[i] |
add | c[i] = a[i] + b[i] |
triad | a[i] = b[i] + (c[i] * scalar) |
Since this is loosely based on a variant of the STREAM benchmark code, DO NOT submit results based on this as it is intended to in stress-ng just to stress memory and compute and NOT intended for STREAM accurate tuned or non-tuned benchmarking whatsoever. Use the official STREAM benchmarking tool if you desire accurate and standardised STREAM benchmarks.
The stressor calculates the memory read rate, memory write rate and floating point operations rate. These will differ from the maximum theoretical read/write/compute rates because of loop overheads and the use of volatile pointers to ensure the compiler does not optimize out stores.
Method | Description |
mq | use posix message queue with a 1 item size. Messages are passed between a sender and receiver process. |
pipe | single character messages are passed down a single character sized pipe between a sender and receiver process. |
sem-sysv | a SYSV semaphore is used to block/run two processes. |
Method | Description |
all | select all the available system calls |
fast10 | select the fastest 10% system call tests |
fast25 | select the fastest 25% system call tests |
fast50 | select the fastest 50% system call tests |
fast75 | select the fastest 75% system call tests |
fast90 | select the fastest 90% system call tests |
geomean1 | select tests that are less or equal to the geometric mean of all the test times |
geomean1 | select tests that are less or equal to 2 × the geometric mean of all the test times |
geomean1 | select tests that are less or equal to 3 × the geometric mean of all the test times |
Option | Description |
all | use all the open options, namely direct, dsync, excl, noatime and sync |
direct | try to minimize cache effects of the I/O to and from this file, using the O_DIRECT open flag. |
dsync | ensure output has been transferred to underlying hardware and file metadata has been updated using the O_DSYNC open flag. |
excl | fail if file already exists (it should not). |
noatime | do not update the file last access time if the file is read. |
sync | ensure output has been transferred to underlying hardware using the O_SYNC open flag. |
Method | Description |
all | iterate through all of the following trigonometric functions |
cos | cosine (double precision) |
cosf | cosine (float precision) |
cosl | cosine (long double precision) |
sin | sine (double precision) |
sinf | sine (float precision) |
sinl | sine (long double precision) |
sincos | sine and cosine (double precision) |
sincosf | sine and cosine (float precision) |
sincosl | sine and cosine (long double precision) |
tan | tangent (double precision) |
tanf | tangent (float precision) |
tanl | tangent (long double precision) |
Method | Description |
all | iterate through all of the following vector methods |
floatv128add | addition of a vector of 128 single precision floating point values |
floatv64add | addition of a vector of 64 single precision floating point values |
floatv32add | addition of a vector of 32 single precision floating point values |
floatv16add | addition of a vector of 16 single precision floating point values |
floatv8add | addition of a vector of 8 single precision floating point values |
floatv128mul | multiplication of a vector of 128 single precision floating point values |
floatv64mul | multiplication of a vector of 64 single precision floating point values |
floatv32mul | multiplication of a vector of 32 single precision floating point values |
floatv16mul | multiplication of a vector of 16 single precision floating point values |
floatv8mul | multiplication of a vector of 8 single precision floating point values |
floatv128div | division of a vector of 128 single precision floating point values |
floatv64div | division of a vector of 64 single precision floating point values |
floatv32div | division of a vector of 32 single precision floating point values |
floatv16div | division of a vector of 16 single precision floating point values |
floatv8div | division of a vector of 8 single precision floating point values |
doublev128add | addition of a vector of 128 double precision floating point values |
doublev64add | addition of a vector of 64 double precision floating point values |
doublev32add | addition of a vector of 32 double precision floating point values |
doublev16add | addition of a vector of 16 double precision floating point values |
doublev8add | addition of a vector of 8 double precision floating point values |
doublev128mul | multiplication of a vector of 128 double precision floating point values |
doublev64mul | multiplication of a vector of 64 double precision floating point values |
doublev32mul | multiplication of a vector of 32 double precision floating point values |
doublev16mul | multiplication of a vector of 16 double precision floating point values |
doublev8mul | multiplication of a vector of 8 double precision floating point values |
doublev128div | division of a vector of 128 double precision floating point values |
doublev64div | division of a vector of 64 double precision floating point values |
doublev32div | division of a vector of 32 double precision floating point values |
doublev16div | division of a vector of 16 double precision floating point values |
doublev8div | division of a vector of 8 double precision floating point values |
doublev128neg | negation of a vector of 128 double precision floating point values |
doublev64neg | negation of a vector of 64 double precision floating point values |
doublev32neg | negation of a vector of 32 double precision floating point values |
doublev16neg | negation of a vector of 16 double precision floating point values |
doublev8neg | negation of a vector of 8 double precision floating point values |
Method | Description |
all | iterate through all of the following vector methods |
u8x64 | shuffle a vector of 64 unsigned 8 bit integers |
u16x32 | shuffle a vector of 32 unsigned 16 bit integers |
u32x16 | shuffle a vector of 16 unsigned 32 bit integers |
u64x8 | shuffle a vector of 8 unsigned 64 bit integers |
u128x4 | shuffle a vector of 4 unsigned 128 bit integers (when supported) |
1. Initialised. The anonymously memory mapped region is set to a known pattern.
2. Exercised. Memory is modified in a known predictable way. Some vm workers alter memory sequentially, some use small or large strides to step along memory.
3. Checked. The modified memory is checked to see if it matches the expected result.
The vm methods containing 'prime' in their name have a stride of the largest prime less than 2↑64, allowing to them to thoroughly step through memory and touch all locations just once while also doing without touching memory cells next to each other. This strategy exercises the cache and page non-locality.
Since the memory being exercised is virtually mapped then there is no guarantee of touching page addresses in any particular physical order. These workers should not be used to test that all the system's memory is working correctly either, use tools such as memtest86 instead.
The vm stress methods are intended to exercise memory in ways to possibly find memory issues and to try to force thermal errors.
Available vm stress methods are described as follows:
Method | Description |
all | iterate over all the vm stress methods as listed below. |
cache-lines | work through memory in 64 byte cache sized steps writing a single byte per cache line. Once the write is complete, the memory is read to verify the values are written correctly. |
cache-stripe | work through memory in 64 byte cache sized chunks, writing in ascending address order on even offsets and descending address order on odd offsets. |
checkboard | work through memory writing alternative zero/one bit values into memory in a mixed checkerboard pattern. Memory is swapped around to ensure every bit is read, bit flipped and re-written and then re-read for verification. |
flip | sequentially work through memory 8 times, each time just one bit in memory flipped (inverted). This will effectively invert each byte in 8 passes. |
fwdrev | write to even addressed bytes in a forward direction and odd addressed bytes in reverse direction. rhe contents are sanity checked once all the addresses have been written to. |
galpat-0 | galloping pattern zeros. This sets all bits to 0 and flips just 1 in 4096 bits to 1. It then checks to see if the 1s are pulled down to 0 by their neighbours or of the neighbours have been pulled up to 1. |
galpat-1 | galloping pattern ones. This sets all bits to 1 and flips just 1 in 4096 bits to 0. It then checks to see if the 0s are pulled up to 1 by their neighbours or of the neighbours have been pulled down to 0. |
gray | fill the memory with sequential gray codes (these only change 1 bit at a time between adjacent bytes) and then check if they are set correctly. |
grayflip | fill memory with adjacent bytes of gray code and inverted gray code pairs to change as many bits at a time between adjacent bytes and check if these are set correctly. |
incdec | work sequentially through memory twice, the first pass increments each byte by a specific value and the second pass decrements each byte back to the original start value. The increment/decrement value changes on each invocation of the stressor. |
inc-nybble | initialise memory to a set value (that changes on each invocation of the stressor) and then sequentially work through each byte incrementing the bottom 4 bits by 1 and the top 4 bits by 15. |
lfsr32 | fill memory with values generated from a 32 bit Galois linear feedback shift register using the polynomial x↑32 + x↑31 + x↑29 + x + 1. This generates a ring of 2↑32 - 1 unique values (all 32 bit values except for 0). |
rand-set | sequentially work through memory in 64 bit chunks setting bytes in the chunk to the same 8 bit random value. The random value changes on each chunk. Check that the values have not changed. |
rand-sum | sequentially set all memory to random values and then summate the number of bits that have changed from the original set values. |
read64 | sequentially read memory using 32 × 64 bit reads per bogo loop. Each loop equates to one bogo operation. This exercises raw memory reads. |
ror | fill memory with a random pattern and then sequentially rotate 64 bits of memory right by one bit, then check the final load/rotate/stored values. |
swap | fill memory in 64 byte chunks with random patterns. Then swap each 64 chunk with a randomly chosen chunk. Finally, reverse the swap to put the chunks back to their original place and check if the data is correct. This exercises adjacent and random memory load/stores. |
move-inv | sequentially fill memory 64 bits of memory at a time with random values, and then check if the memory is set correctly. Next, sequentially invert each 64 bit pattern and again check if the memory is set as expected. |
modulo-x | fill memory over 23 iterations. Each iteration starts one byte further along from the start of the memory and steps along in 23 byte strides. In each stride, the first byte is set to a random pattern and all other bytes are set to the inverse. Then it checks see if the first byte contains the expected random pattern. This exercises cache store/reads as well as seeing if neighbouring cells influence each other. |
mscan | fill each bit in each byte with 1s then check these are set, fill each bit in each byte with 0s and check these are clear. |
prime-0 | iterate 8 times by stepping through memory in very large prime strides clearing just on bit at a time in every byte. Then check to see if all bits are set to zero. |
prime-1 | iterate 8 times by stepping through memory in very large prime strides setting just on bit at a time in every byte. Then check to see if all bits are set to one. |
prime-gray-0 | first step through memory in very large prime strides clearing just on bit (based on a gray code) in every byte. Next, repeat this but clear the other 7 bits. Then check to see if all bits are set to zero. |
prime-gray-1 | first step through memory in very large prime strides setting just on bit (based on a gray code) in every byte. Next, repeat this but set the other 7 bits. Then check to see if all bits are set to one. |
rowhammer | try to force memory corruption using the rowhammer memory stressor. This fetches two 32 bit integers from memory and forces a cache flush on the two addresses multiple times. This has been known to force bit flipping on some hardware, especially with lower frequency memory refresh cycles. |
walk-0d | for each byte in memory, walk through each data line setting them to low (and the others are set high) and check that the written value is as expected. This checks if any data lines are stuck. |
walk-1d | for each byte in memory, walk through each data line setting them to high (and the others are set low) and check that the written value is as expected. This checks if any data lines are stuck. |
walk-0a | in the given memory mapping, work through a range of specially chosen addresses working through address lines to see if any address lines are stuck low. This works best with physical memory addressing, however, exercising these virtual addresses has some value too. |
walk-1a | in the given memory mapping, work through a range of specially chosen addresses working through address lines to see if any address lines are stuck high. This works best with physical memory addressing, however, exercising these virtual addresses has some value too. |
write64 | sequentially write to memory using 32 × 64 bit writes per bogo loop. Each loop equates to one bogo operation. This exercises raw memory writes. Note that memory writes are not checked at the end of each test iteration. |
write64nt | sequentially write to memory using 32 × 64 bit non-temporal writes per bogo loop. Each loop equates to one bogo operation. This exercises cacheless raw memory writes and is only available on x86 sse2 capable systems built with gcc and clang compilers. Note that memory writes are not checked at the end of each test iteration. |
write1024v | sequentially write to memory using 1 × 1024 bit vector write per bogo loop (only available if the compiler supports vector types). Each loop equates to one bogo operation. This exercises raw memory writes. Note that memory writes are not checked at the end of each test iteration. |
wrrd128nt | write to memory in 128 bit chunks using non-temporal writes (bypassing the cache). Each chunk is written 4 times to hammer the memory. Then check to see if the data is correct using non-temporal reads if they are available or normal memory reads if not. Only available with processors that provide non-temporal 128 bit writes. |
zero-one | set all memory bits to zero and then check if any bits are not zero. Next, set all the memory bits to one and check if any bits are not one. |
Available vm address stress methods are described as follows:
Method | Description |
all | iterate over all the vm stress methods as listed below. |
bitposn | iteratively write to memory in powers of 2 strides of max_stride to 1 and then read check memory in powers of 2 strides 1 to max_stride where max_stride is half the size of the memory mapped region. All bit positions of the memory address space are bit flipped in the striding. |
dec | work through the address range backwards sequentially, byte by byte. |
decinv | like dec, but with all the relevant address bits inverted. |
flip | address memory using gray coded addresses and their inverse to flip as many address bits per write/read operation |
gray | work through memory with gray coded addresses so that each change of address just changes 1 bit compared to the previous address. |
grayinv | like gray, but with the all relevant address bits inverted, hence all bits change apart from 1 in the address range. |
inc | work through the address range forwards sequentially, byte by byte. |
incinv | like inc, but with all the relevant address bits inverted. |
pwr2 | work through memory addresses in steps of powers of two. |
pwr2inv | like pwr2, but with the all relevant address bits inverted. |
rev | work through the address range with the bits in the address range reversed. |
revinv | like rev, but with all the relevant address bits inverted. |
Method | Description |
all | exercise all the following VNNI methods |
vpaddb512 | 8 bit vector addition using 512 bit vector operations on 64 × 8 bit integers, (x86 vpaddb) |
vpaddb256 | 8 bit vector addition using 256 bit vector operations on 32 × 8 bit integers, (x86 vpaddb) |
vpaddb128 | 8 bit vector addition using 128 bit vectors operations on 32 × 8 bit integers, (x86 vpaddb) |
vpaddb | 8 bit vector addition using 8 bit sequential addition (may be vectorized by the compiler) |
vpdpbusd512 | 8 bit vector multiplication of unsigned and signed 8 bit values followed by 16 bit summation using 512 bit vector operations on 64 × 8 bit integers, (x86 vpdpbusd) |
vpdpbusd256 | 8 bit vector multiplication of unsigned and signed 8 bit values followed by 16 bit summation using 256 bit vector operations on 32 × 8 bit integers, (x86 vpdpbusd) |
vpdpbusd128 | 8 bit vector multiplication of unsigned and signed 8 bit values followed by 16 bit summation using 128 bit vector operations on 32 × 8 bit integers, (x86 vpdpbusd) |
vpdpbusd | 8 bit vector multiplication of unsigned and signed 8 bit values followed by 16 bit summation using sequential operations (may be vectorized by the compiler) |
vpdpwssd512 | 16 bit vector multiplication of unsigned and signed 16 bit values followed by 32 bit summation using 512 bit vector operations on 64 × 8 bit integers, (x86 vpdpwssd) |
vpdpwssd256 | 16 bit vector multiplication of unsigned and signed 16 bit values followed by 32 bit summation using 256 bit vector operations on 64 × 8 bit integers, (x86 vpdpwssd) |
vpdpwssd128 | 16 bit vector multiplication of unsigned and signed 16 bit values followed by 32 bit summation using 128 bit vector operations on 64 × 8 bit integers, (x86 vpdpwssd) |
vpdpwssd | 16 bit vector multiplication of unsigned and signed 16 bit values followed by 32 bit summation using sequential operations (may be vectorized by the compiler) |
Method Description | |
all | randomly select any one of all the following methods: |
fma | perform multiply-add operations, on modern processors these may be compiled into fused-multiply-add instructions. |
getpid | get the stressor's PID via getpid(2). |
time | get the current time via time(2). |
inc64 | increment a 64 bit integer. |
memmove | copy (move) a 1MB buffer using memmove(3). |
memread | read from a 1MB buffer using fast memory reads. |
memset | write to a 1MB buffer using memset(3). |
mcw64 | compute 64 bit random numbers using a mwc random generator. |
nop | waste cycles using no-op instructions. |
pause | stop execution using CPU pause/yield or memory barrier instructions where available. |
random | a random mix of all the workload methods, changing the workload method on every spin-loop. |
sqrt | perform double precision floating point sqrt(3) and hypot(3) math operations. |
Method | Description |
cluster | cluster 2/3 of the start times to try to start at the random time during the time slice, with the other 1/3 of start times evenly randomly distributed using a single random variable. The clustered start times causes a burst of items to be scheduled in a bunch with no delays between each clustered work item. |
even | evenly distribute scheduling start times across the workload slice |
poisson | generate scheduling events that occur individually at random moments, but which tend to occur at an average rate (known as a Poisson process). |
random1 | evenly randomly distribute scheduling start times using a single random variable. |
random2 | randomly distribute scheduling start times using a sum of two random variables, much like throwing 2 dice. |
random3 | randomly distribute scheduling start times using a sum of three random variables, much like throwing 3 dice. |
Value | |||
1 | minimum memory usage. | ||
9 | maximum memory usage. |
Method | Description |
00ff | randomly distributed 0x00 and 0xFF values. |
ascii01 | randomly distributed ASCII 0 and 1 characters. |
asciidigits | randomly distributed ASCII digits in the range of 0 and 9. |
bcd | packed binary coded decimals, 0..99 packed into 2 4-bit nybbles. |
binary | 32 bit random numbers. |
brown | 8 bit brown noise (Brownian motion/Random Walk noise). |
double | double precision floating point numbers from sin(θ). |
fixed | data stream is repeated 0x04030201. |
gcr | random values as 4 × 4 bit data turned into 4 × 5 bit group coded recording (GCR) patterns. Each 5 bit GCR value starts or ends with at most one zero bit so that concatenated GCR codes have no more than two zero bits in a row. |
gray | 16 bit gray codes generated from an incrementing counter. |
inc16 | 16 bit incrementing values starting from a random 16 bit value. |
latin | Random latin sentences from a sample of Lorem Ipsum text. |
lehmer | Fast random values generated using Lehmer's generator using a 128 bit multiply. |
lfsr32 | Values generated from a 32 bit Galois linear feedback shift register using the polynomial x↑32 + x↑31 + x↑29 + x + 1. This generates a ring of 2↑32 - 1 unique values (all 32 bit values except for 0). |
logmap | Values generated from a logistical map of the equation Χn+1 = r × Χn × (1 - Χn) where r > ≈ 3.56994567 to produce chaotic data. The values are scaled by a large arbitrary value and the lower 8 bits of this value are compressed. |
lrand48 | Uniformly distributed pseudo-random 32 bit values generated from lrand48(3). |
morse | Morse code generated from random latin sentences from a sample of Lorem Ipsum text. |
nybble | randomly distributed bytes in the range of 0x00 to 0x0f. |
objcode | object code selected from a random start point in the stress-ng text segment. |
parity | 7 bit binary data with 1 parity bit. |
pink | pink noise in the range 0..255 generated using the Gardner method with the McCartney selection tree optimization. Pink noise is where the power spectral density is inversely proportional to the frequency of the signal and hence is slightly compressible. |
random | segments of the data stream are created by randomly calling the different data generation methods. |
rarely1 | data that has a single 1 in every 32 bits, randomly located. |
rarely0 | data that has a single 0 in every 32 bits, randomly located. |
rdrand | generate random data using rdrand instruction (x86) or use 64 bit mwc psuedo-random number generator for non-x86 systems. |
ror32 | generate a 32 bit random value, rotate it right 0 to 7 places and store the rotated value for each of the rotations. |
text | random ASCII text. |
utf8 | random 8 bit data encoded to UTF-8. |
zero | all zeros, compresses very easily. |
Value | |||
0 | used for normal data (Z_DEFAULT_STRATEGY). | ||
1 | for data generated by a filter or predictor (Z_FILTERED) | ||
2 | forces huffman encoding (Z_HUFFMAN_ONLY). | ||
3 | Limit match distances to one run-length-encoding (Z_RLE). | ||
4 | prevents dynamic huffman codes (Z_FIXED). |
Value | |||
0 | creates an endless deflate stream until stressor stops. | ||
n | creates an stream of n bytes over and over again. | ||
Each block will be closed with Z_STREAM_END. |
Value | |||
-8-(-15) | raw deflate format. | ||
8-15 | zlib format. | ||
24-31 | gzip format. | ||
40-47 | inflate auto format detection using zlib deflate format. |
stress-ng --vm 8 --vm-bytes 80% -t 1h
stress-ng --cpu 4 --io 2 --vm 1 --vm-bytes 1G --timeout 60s
stress-ng --iomix 2 --iomix-bytes 10% -t 10m
stress-ng --with cpu,matrix,vecmath,fp --seq 8 -t 1m
stress-ng --with cpu,matrix,vecmath,fp --permute 5 -t 10s
stress-ng --cyclic 1 --cyclic-dist 2500 --cyclic-method clock_ns --cyclic-prio 100 --cyclic-sleep 10000 --hdd 0 -t 1m
stress-ng --cpu 8 --cpu-ops 800000
stress-ng --sequential 2 --timeout 2m --metrics
stress-ng --cpu 4 --cpu-method fft --cpu-ops 10000 --metrics-brief
stress-ng --cpu -1 --cpu-method all -t 1h --cpu-load 90
stress-ng --cpu 0 --cpu-method all -t 20m
stress-ng --all 4 --timeout 5m
stress-ng --random 64
stress-ng --cpu 64 --cpu-method all --verify -t 10m --metrics-brief
stress-ng --sequential -1 -t 10m
stress-ng --sequential 8 --class io -t 5m --times
stress-ng --all -1 --maximize --aggressive
stress-ng --random 32 -x numa,hdd,key
stress-ng --sequential 4 --class vm --exclude bigheap,brk,stack
stress-ng --taskset 0,2-3 --cpu 3
Status | Description |
0 | Success. |
1 | Error; incorrect user options or a fatal resource issue in the stress-ng stressor harness (for example, out of memory). |
2 | One or more stressors failed. |
3 | One or more stressors failed to initialise because of lack of resources, for example ENOMEM (no memory), ENOSPC (no space on file system) or a missing or unimplemented system call. |
4 | One or more stressors were not implemented on a specific architecture or operating system. |
5 | A stressor has been killed by an unexpected signal. |
6 | A stressor exited by exit(2) which was not expected and timing metrics could not be gathered. |
7 | The bogo ops metrics maybe untrustworthy. This is most likely to occur when a stress test is terminated during the update of a bogo-ops counter such as when it has been OOM killed. A less likely reason is that the counter ready indicator has been corrupted. |
File bug reports at: https://github.com/ColinIanKing/stress-ng/issues
cpuburn(1), perf(1), stress(1),
taskset(1)
https://github.com/ColinIanKing/stress-ng/blob/master/README.md
stress-ng was written by Colin Ian King <colin.i.king@gmail.com> and is a clean room re-implementation and extension of the original stress tool by Amos Waterland. Thanks also to the many contributors to stress-ng. The README.md file in the source contains a full list of the contributors.
Sending a SIGALRM, SIGINT or SIGHUP to stress-ng causes it to terminate all the stressor processes and ensures temporary files and shared memory segments are removed cleanly.
Sending a SIGUSR2 to stress-ng will dump out the current load average and memory statistics.
Note that the stress-ng cpu, io, vm and hdd tests are different implementations of the original stress tests and hence may produce different stress characteristics.
The bogo operations metrics may change with each release because of bug fixes to the code, new features, compiler optimisations, changes in support libraries or system call performance.
Copyright © 2013-2021 Canonical Ltd, Copyright ©
2021-2024 Colin Ian King.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
26 February 2024 |