GOTO-INSTRUMENT(1) | User Commands | GOTO-INSTRUMENT(1) |
goto-instrument - Perform analysis or instrumentation of goto binaries
goto-instrument reads a GOTO binary, performs a given program transformation, and then writes the resulting program as GOTO binary on disk.
Below we give two usage examples for the options. Consider the following test, for function get_celsius and with harness test_get_celsius:
#include <assert.h> #include <limits.h> #include <stdint.h> // hardware sensor for temperature in kelvin extern volatile uint16_t temperature; int get_celsius() { if (temperature > (1000 + 273)) { return INT_MIN; // value indicating error } return temperature - 273; } void test_get_celsius() { int t = get_celsius(); assert(t == INT_MIN || t <= 1000); assert(t == INT_MIN || t >= -273); }
Here the variable temperature corresponds to a hardware sensor. It returns the current temperature on each read. The get_celsius function converts the value in Kelvin to degrees Celsius, given the value is in the expected range. However, it has a bug where it reads temperature a second time after the check, which may yield a value for which the check would not succeed. Verifying this program as is with cbmc(1) would yield a verification success. We can use goto-instrument to make reads from temperature non-deterministic:
goto-cc -o get_celsius_test.gb get_celsius_test.c goto-instrument --nondet-volatile-variable temperature \ get_celsius_test.gb get_celsius_test-mod.gb cbmc --function test_get_celsius get_celsius_test-mod.gb
Here the final invocation of cbmc(1) correctly reports a verification failure.
#include <assert.h> #include <stdint.h> extern volatile uint32_t clock; typedef struct message { uint32_t timestamp; void *data; } message_t; void *read_data(); message_t get_message() { message_t msg; msg.timestamp = clock; msg.data = read_data(); return msg; } void test_get_message() { message_t msg1 = get_message(); message_t msg2 = get_message(); assert(msg1.timestamp <= msg2.timestamp); }
The harness verifies that get_message assigns non-decreasing time stamps to the returned messages. However, simply treating clock as non-deterministic would not suffice to prove this property. Thus, we can supply a model for reads from clock:
// model for reads of the variable clock uint32_t clock_read_model() { static uint32_t clock_value = 0; uint32_t increment; __CPROVER_assume(increment <= 100); clock_value += increment; return clock_value; }
The model is stateful in that it keeps the current clock value between invocations in the variable clock_value. On each invocation, it increments the clock by a non-deterministic value in the range 0 to 100. We can tell goto-instrument to use the model clock_read_model for reads from the variable clock as follows:
goto-cc -o get_message_test.gb get_message_test.c goto-instrument --nondet-volatile-model clock:clock_read_model \ get_message_test.gb get_message_test-mod.gb cbmc --function get_message_test get_message_test-mod.gb
Now the final invocation of cbmc(1) reports verification success.
// main.c int global; const int c_global; int function(int *param, const int *c_param);
Often we want to avoid overwriting internal symbols, i.e., those with an __ prefix, which is done using the pattern (?!__).
goto-cc main.c -o main.gb goto-instrument main.gb main-out.gb \ --generate-function-body-options 'havoc,params:(?!__).*,globals:(?!__).*' \ --generate-funtion-body function
This leads to a GOTO binary equivalent to the following C code:
// main-mod.c int function(int *param, const int *c_param) { *param = nondet_int(); global = nondet_int(); return nondet_int(); }
The parameters should that should be non-deterministically updated can be specified either by a regular expression (as above) or by a semicolon-separated list of their numbers. For example havoc,params:0;3;4 will assign non-deterministic values to the first, fourth, and fifth parameter.
Note that only parameters of pointer type can be havoced and goto-instrument will produce an error report if given a parameter number associated with a non-pointer parameter. Requesting to havoc a parameter with a number higher than the number of parameters a given function takes will also results in an error report.
// main.c int function(int *first, int *second, int *third); int main() { int a = 10; int b = 10; int c = 10; function(&a, &b, &c); function(&a, &b, &c); }
The user can specify different behavior for each call-site as follows:
goto-cc main.c -o main.gb goto-instrument main.gb main-mod.gb \ --generate-havocing-body 'function,1,params:0;2,2,params:1'
This results in a GOTO binary equivalent to:
// main-mod.c int function_1(int *first, int *second, int *third) { *first = nondet_int(); *third = nondet_int(); } int function_2(int *first, int *second, int *third) { *second = nondet_int(); } int main() { int a = 10; int b = 10; int c = 10; function_1(&a, &b, &c); function_2(&a, &b, &c); }
To facilitate the controlled replace, we will label the places in each function where function pointers are being called, to this pattern:
function-name.function_pointer_call.N
where N is refers to the N-th function call via a function pointer in function-name, i.e., the first call to a function pointer in a function will have N=1, the fifth N=5 etc. Alternatively, if the calls carry labels in the source code, we can also refer to a function pointer as
function-name.label
To implement this assumption that the first call to a function pointer in function call an only be a call to f or g, use
goto-instrument --restrict-function-pointer \ call.function_pointer_call.1/f,g in.gb out.gb
The resulting output (written to GOTO binary out.gb) looks similar to the original example, except now there will not be a call to h:
void call(fptr_t fptr) { int r; if (fptr == &f) { r = f(10); } else if (fptr == &g) { r = g(10); } else { // sanity check assert(false); assume(false); } return r; }
As another example imagine we have a simple virtual filesystem API and implementation like this:
typedef struct filesystem_t filesystem_t; struct filesystem_t { int (*open)(filesystem_t *filesystem, const char *file_name); }; int fs_open(filesystem_t *filesystem, const char *file_name) { filesystem->open(filesystem, file_name); } int nullfs_open(filesystem_t *filesystem, const char *file_name) { return -1; } filesystem_t nullfs_val = {.open = nullfs_open}; filesystem *const nullfs = &nullfs_val; filesystem_t *get_fs_impl() { // some fancy logic to determine // which filesystem we're getting - // in-memory, backed by a database, OS file system // - but in our case, we know that // it always ends up being nullfs // for the cases we care about return nullfs; } int main(void) { filesystem_t *fs = get_fs_impl(); assert(fs_open(fs, "hello.txt") != -1); }
In this case, the assumption is that in function main, fs can be nothing other than nullfs. But perhaps due to the logic being too complicated, symbolic execution ends up being unable to figure this out, so in the call to fs_open we end up branching on all functions matching the signature of filesystem_t::open, which could be quite a few functions within the program. Worst of all, if its address is ever taken in the program, as far as function pointer removal via --remove-function-pointers is concerned it could be fs_open itself due to it having a matching signature, leading to symbolic execution being forced to follow a potentially infinite recursion until its unwind limit.
In this case we can again restrict the function pointer to the value which we know it will have:
goto-instrument --restrict-function-pointer \ fs_open.function_pointer_call.1/nullfs_open in.gb out.gb
{ "function_call_site_name": ["function1", "function2", ...], ... }
If you pass in multiple files, or a mix of files and command line restrictions, the final restrictions will be a set union of all specified restrictions.
Note that if something goes wrong during type checking (i.e., making sure that all function pointer replacements refer to functions in the symbol table that have the correct type), the error message will refer to the command line option --restrict-function-pointer regardless of whether the restriction in question came from the command line or a file.
// needs_argv.c #include <assert.h> int main(int argc, char *argv[]) { if (argc >= 2) assert(argv[1] != 0); return 0; }
If cbmc(1) is run directly on this example, it will report a failing assertion for the lack of modeling of argv. To make the assertion succeed, as expected, use:
goto-cc needs_argv.c goto-instrument --model-argc-argv 2 a.out a.out cbmc a.out
int f(int x); int g(int x); int h(int x);
And we have a call site like this:
typedef int (*fptr_t)(int x); void call(fptr_t fptr) { int r = fptr(10); assert(r > 0); }
Function pointer removal will turn this into code similar to this:
void call(fptr_t fptr) { int r; if (fptr == &f) { r = f(10); } else if (fptr == &g) { r = g(10); } else if (fptr == &h) { r = h(10); } else { // sanity check assert(false); assume(false); } return r; }
Beware that there may be many functions matching a particular signature, and some of them may be costly to a subsequently run analysis. Consider using --restrict-function-pointer to manually specify this set of functions, or --value-set-fi-fp-removal.
All tools honor the TMPDIR environment variable when generating temporary files and directories.
If you encounter a problem please create an issue at https://github.com/diffblue/cbmc/issues
cbmc(1), goto-cc(1)
2008-2013, Daniel Kroening
June 2022 | goto-instrument-5.59.0 |