futhark-opencl - compile Futhark to OpenCL
futhark opencl [options...] <program.fut>
futhark opencl translates a Futhark program to C code
invoking OpenCL kernels, and either compiles that C code with a C compiler
to an executable binary program, or produces a .h and .c file
that can be linked with other code. The standard Futhark optimisation
pipeline is used.
futhark opencl uses -lOpenCL to link (-framework
OpenCL on macOS). If using --library, you will need to do the
same when linking the final binary.
The GPU terminology used is derived from CUDA nomenclature (e.g.
"thread block" instead of "workgroup"), but OpenCL
nomenclature is also supported for compatibility.
Accepts the same options as futhark-c.
CC
The C compiler used to compile the program. Defaults to
cc if unset.
CFLAGS
Space-separated list of options passed to the C compiler.
Defaults to -O -std=c99 if unset.
Generated executables accept the same options as those generated
by futhark-c. For the -t option, The time taken to perform
device setup or teardown, including writing the input or reading the result,
is not included in the measurement. In particular, this means that timing
starts after all kernels have been compiled and data has been copied to the
device buffers but before setting any kernel arguments. Timing stops after
the kernels are done running, but before data has been read from the buffers
or the buffers have been released.
The following additional options are accepted.
- --build-option=OPT
- Add an additional build option to the string passed to
clBuildProgram(). Refer to the OpenCL documentation for which
options are supported. Be careful - some options can easily result in
invalid results.
- --default-thread-block-size=INT, --default-group-size=INT
- The default size of thread blocks that are launched. Capped to the
hardware limit if necessary.
- --default-num-thread-blocks,--default-num-groups=INT
- The default number of thread blocks that are launched.
- --default-threshold=INT
- The default parallelism threshold used for comparisons when selecting
between code versions generated by incremental flattening. Intuitively,
the amount of parallelism needed to saturate the GPU.
- --default-tile-size=INT
- The default tile size used when performing two-dimensional tiling (the
workgroup size will be the square of the tile size).
- -d,--device=NAME
- Use the first OpenCL device whose name contains the given string. The
special string #k, where k is an integer, can be used to
pick the k-th device, numbered from zero. If used in conjunction
with -p, only the devices from matching platforms are
considered.
- --dump-opencl=FILE
- Don't run the program, but instead dump the embedded OpenCL program to the
indicated file. Useful if you want to see what is actually being
executed.
- --dump-opencl-binary=FILE
- Don't run the program, but instead dump the compiled version of the
embedded OpenCL program to the indicated file. On NVIDIA platforms, this
will be PTX code.
- --load-opencl=FILE
- Instead of using the embedded OpenCL program, load it from the indicated
file.
- --load-opencl-binary=FILE
- Load an OpenCL binary from the indicated file.
- -p,--platform=NAME
- Use the first OpenCL platform whose name contains the given string. The
special string #k, where k is an integer, can be used to
pick the k-th platform, numbered from zero.
- --list-devices
- List all OpenCL devices and platforms available on the system.
futhark-test, futhark-cuda, futhark-c
2013-2020, DIKU, University of Copenhagen