BLKIO(3) | Library Functions Manual | BLKIO(3) |
blkio - Block device I/O library
libblkio is a library for accessing data stored on block devices. Block devices offer persistent data storage and are addressable in fixed-size units called blocks. Block sizes of 4 KiB or 512 bytes are typical. Hard disk drives, solid state disks (SSDs), USB mass storage devices, and other types of hardware are block devices.
The focus of libblkio is on fast I/O for multi-threaded applications. Management of block devices, including partitioning and resizing, is outside the scope of the library.
Block devices have one or more queues for submitting I/O requests such as reads and writes. Block devices process I/O requests from their queues and produce a return code for each completed request indicating success or an error.
The application is responsible for thread-safety. No thread synchronization is necessary when a queue is only used from a single thread. Proper synchronization is required when sharing a queue between multiple threads.
libblkio can be used in blocking, event-driven, and polling modes depending on the architecture of the application and its performance requirements.
Blocking mode suspends the execution of the current thread until the request completes. This is most natural way of writing programs that perform a sequence of I/O requests but cannot exploit request parallelism.
Event-driven mode provides a completion file descriptor that the application can monitor from its event loop. This allows multiple I/O requests to be in flight simultaneously and the application can respond to other events while waiting for completions.
Polling mode also supports multiple in-flight requests but the application continuously checks for completions, typically from a tight loop, in order to minimize latency.
libblkio contains drivers for several block I/O interfaces. This allows applications using libblkio to access different block devices through a single API.
A struct blkio instance is created from a specific driver such as "io_uring" as follows:
struct blkio *b; int ret; ret = blkio_create("io_uring", &b); if (ret < 0) { fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg()); return; }
For a list of available drivers, see the DRIVERS section below.
Functions generally return 0 on success and a negative errno(3) value on failure. In the later case, a per-thread error message is also set and can be obtained as a const char * by calling blkio_get_error_msg().
Note that these messages are not stable and may change in between backward-compatible libblkio releases. The same applies to returned errno values, unless a specific value is explicitly documented for a particular error condition.
Connection details for a block device are specified by setting properties on the blkio instance. The available properties depend on the driver. For example, the io_uring driver's "path" property is set to /dev/sdb to access a local disk:
int ret = blkio_set_str(b, "path", "/dev/sdb"); if (ret < 0) { fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg()); blkio_destroy(&b); return; }
Once the connection details have been specified the blkio instance can be connected to the block device with blkio_connect():
ret = blkio_connect(b);
After the blkio instance is connected, properties are available to configure its operation and query device characteristics such as the maximum number of queues. See PROPERTIES for details.
For example, the number of queues can be set as follows:
ret = blkio_set_int(b, "num-queues", 4);
Once configuration is complete the blkio instance is started with blkio_start():
ret = blkio_start(b);
Memory containing I/O data buffers must be "mapped" before submitting requests that touch the memory when the "needs-mem-regions" property is true. Otherwise mapping memory is optional but doing so may improve performance.
Memory regions are mapped globally for the blkio instance and are available to all queues. A memory region is represented as follows:
struct blkio_mem_region { void *addr; uint64_t iova; size_t len; int64_t fd_offset; int fd; uint32_t flags; };
The addr field contains the starting address of the memory region. Requests transfer data between the block device and a subset of the memory region, including up to the entire memory region. Individual read/write requests or readv/writev request segments (iovecs) must not access more than one memory region. Multiple requests can access the same memory region simultaneously, although usually with non-overlapping areas.
The addr field must be a multiple of the "mem-region-alignment" property.
The iova field is reserved and must be zero.
The len field is the size of the memory region in bytes. The value must be a multiple of the "mem-region-alignment" property.
The fd field is the file descriptor for the memory region. Some drivers require that I/O data buffers are located in file-backed memory. This can be anonymous memory from memfd_create(2) rather than an actual file on disk. If the "needs-mem-region-fd" property is true then this field must be a valid file descriptor. If the property is false this field may be -1.
The fd_offset field is the byte offset from the start of the file given in fd.
The flags field is reserved and must be zero.
The application can either allocate I/O data buffers itself and describe them with struct blkio_mem_region or it can use blkio_alloc_mem_region() and blkio_free_mem_region() to allocate memory suitable for I/O data buffers:
int blkio_alloc_mem_region(struct blkio *b, struct blkio_mem_region *region, size_t len); void blkio_free_mem_region(struct blkio *b, const struct blkio_mem_region *region);
The len argument is the number of bytes to allocate. These functions may only be called after the blkio instance has been started.
File descriptors for memory regions created with blkio_alloc_mem_region() are automatically closed across execve(2).
Memory regions can be mapped and unmapped after the blkio instance has been started using the blkio_map_mem_region() and blkio_unmap_mem_region() functions:
int blkio_map_mem_region(struct blkio *b, const struct blkio_mem_region *region); void blkio_unmap_mem_region(struct blkio *b, const struct blkio_mem_region *region);
These functions must not be called while requests are in flight that access the affected memory region. Memory regions must not overlap. Memory regions must be unmapped/freed with exactly the same region field values that they were mapped/allocated with.
blkio_map_mem_region() does not take ownership of region->fd. The caller may close region->fd after blkio_map_mem_region() returns.
blkio_map_mem_region() returns an error if called on a memory region that is already mapped against the given blkio. blkio_unmap_mem_region() has no effect when called on a memory region that is not mapped against the given blkio.
blkio_free_mem_region() must not be called on a memory region that was mapped but not unmapped.
For best performance applications should map memory regions once and reuse them instead of changing memory regions frequently.
The "max-mem-regions" property gives the maximum number of memory regions that can be mapped.
Memory regions are automatically unmapped when blkio_destroy() is called, and memory regions allocated using blkio_alloc_mem_region() are freed.
Once at least one memory region has been mapped, the queues are ready for request processing. The following example reads 4096 bytes from byte offset 0x10000:
struct blkioq *q = blkio_get_queue(b, 0); blkioq_read(q, 0x10000, buf, buf_size, NULL, 0); struct blkio_completion completion; ret = blkioq_do_io(q, &completion, 1, 1, NULL); if (ret != 1) ... if (completion.ret != 0) ...
This is an example of blocking mode where blkioq_do_io() waits until the I/O request completes. See below for details on event-driven and polling modes.
The blkioq_do_io() function offers the following arguments:
int blkioq_do_io(struct blkioq *q, struct blkio_completion *completions, int min_completions, int max_completions, struct timespec *timeout);
The completions argument is a pointer to an array that is filled in with completions when the function returns. When max_completions is 0 completions may be NULL. Completions are represented by struct blkio_completion:
struct blkio_completion { void *user_data; const char *error_msg; int ret; /* reserved space */ };
The user_data field is the same pointer passed to blkioq_read() in the example above. Applications that submit multiple requests can use user_data to correlate completions to previously submitted requests.
The ret field is the return code for the I/O request in negative errno representation. This field is 0 on success for most request types. For blkioq_report_zones(), ret is the number of zones filled in or a negative errno.
For some errors, the error_msg field points to a message describing what caused the request to fail. Note that this may be NULL even if ret is not 0, and is always NULL when ret is 0.
Note that these messages are not stable and may change in between backward-compatible libblkio releases. The same applies to the errno values returned through ret, unless a specific value is explicitly documented for a particular error condition.
struct blkio_completion also includes some reserved space which may be used to add more fields in the future in a backward-compatible manner.
The remaining arguments of blkioq_do_io() are as follows:
The min_completions argument controls how many completions to wait for. A value greater than 0 causes the function to block until the number of completions has been reached. A value of 0 causes the function to submit I/O and return completions that have already occurred without waiting for more. If greater than the number of currently outstanding requests, blkioq_do_io() fails with -EINVAL.
The max_completions argument is the maximum number of completions elements to fill in. This value must be greater or equal to min_completions.
The timeout argument specifies the maximum amount of time to wait for completions. The function returns -ETIME if the timeout expires before a request completes. If timeout is NULL the function blocks indefinitely. When timeout is non-NULL the elapsed time is subtracted and the struct timespec is updated when the function returns regardless of success or failure.
The return value is the number of completions elements filled in. This value is within the inclusive range [min_completions, max_completions] on success or a negative errno on failure.
A blkioq_do_io_interruptible() variant is also available:
int blkioq_do_io_interruptible(struct blkioq *q, struct blkio_completion *completions, int min_completions, int max_completions, struct timespec *timeout, const sigset_t *sig);
Unlike blkioq_do_io(), this function can be interrupted by signals and return -EINTR. The sig argument temporarily sets the signal mask of the process while waiting for completions, which allows the thread to be woken by a signal without race conditions. To ensure this function is interrupted when a signal is received, (1) the said signal must be in a blocked state when invoking the function (see sigprocmask(2)) and (2) a signal mask unblocking that signal must be given as the sig argument.
Completion processing can be integrated into the event loop of an application so that other activity can take place while I/O is in flight. Each queue has a completion file descriptor that is returned by the following function:
int blkioq_get_completion_fd(struct blkioq *q);
The returned file descriptor becomes readable when blkioq_do_io() needs to be called again. Spurious events can occur, causing the fd to become readable even if there are no new completions available.
The returned file descriptor has O_NONBLOCK set. The application may switch the file descriptor to blocking mode.
By default, the driver might not generate completion events for requests so it is necessary to explicitly enable the completion file descriptor before use:
void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);
Changes made using this function apply also to requests that are already in flight but not yet completed. Note that even after calling this function with enabled as false, the driver may still generate completion events.
The application must read 8 bytes from the completion file descriptor to reset the event before calling blkioq_do_io(). The contents of the bytes are undefined and should not be interpreted by the application.
The following example demonstrates event-driven I/O:
struct blkioq *q = blkio_get_queue(b, 0); int completion_fd = blkio_get_completion_fd(q); char event_data[8]; /* Switch to blocking mode for read(2) below */ fcntl(completion_fd, F_SETFL, fcntl(completion_fd, F_GETFL, NULL) & ~O_NONBLOCK); /* Enable completion events */ blkioq_set_completion_fd_enabled(q, true); blkioq_read(q, 0x10000, buf, buf_size, NULL, 0); /* Since min_completions = 0 we will submit but not wait */ ret = blkioq_do_io(q, NULL, 0, 0, NULL); if (ret != 0) ... /* Wait for the next event on the completion file descriptor */ struct blkio_completion completion; do { read(completion_fd, event_data, sizeof(event_data)); ret = blkioq_do_io(q, &completion, 0, 1, NULL); } while (ret == 0); if (ret != 1) ... if (completion.ret != 0) ...
This example uses a blocking read(2) to wait and consume the next event on the completion file descriptor. Because spurious events can occur, it then checks if there actually is a completion available, retrying read(2) otherwise.
Normally completion_fd would be registered with an event loop so the application can perform other tasks while waiting.
Applications may save CPU cycles by suppressing completion file descriptor notifications while processing completions. This optimization avoids an unnecessary application event loop iteration and completion file descriptor read when additional completions arrive while the application is processing completions:
static void process_completions(...) { int ret; /* Suppress completion fd notifications while we process completions */ blkioq_set_completion_fd_enabled(q, false); do { struct blkioq_completion completion; ret = blkioq_do_io(q, &completion, 0, 1, NULL); if (ret == 0) { blkioq_set_completion_fd_enabled(q, true); /* Re-check for completions to avoid race */ ret = blkioq_do_io(q, &completion, 0, 1, NULL); if (ret == 1) { blkioq_set_completion_fd_enabled(q, false); } } if (ret < 0) { ... /* error */ } if (ret == 1) { ... /* process completion */ } } while (ret == 1); }
Waiting for completions using blkioq_do_io() with min_completions > 0 can cause the current thread to be descheduled by the operating system's scheduler. The same is true when waiting for events on the completion file descriptor returned by blkioq_get_completion_fd(). Some applications require consistent low response times and therefore cannot risk being descheduled.
blkioq_do_io() may be called from a CPU polling loop with min_completions = 0 to check for completions:
struct blkioq *q = blkio_get_queue(b, 0); blkioq_read(q, 0x10000, buf, buf_size, NULL, 0); /* Busy-wait for the completion */ struct blkio_completion completion; do { ret = blkioq_do_io(q, &completion, 0, 1, NULL); } while (ret == 0); if (ret != 1) ... if (completion.ret != 0) ...
This approach is ideal for applications that need to poll several event sources simultaneously, or that need to intersperse polling with other application logic. Otherwise, driver-level polling (see below) may lead to further performance gains.
Poll queues differ from the "regular" queues presented above in that calling blkioq_do_io() with min_completions > 0 causes libblkio itself (or other lower layers) to poll for completions. This can be more efficient than repeatedly invoking blkioq_do_io() with min_completions = 0 on a "regular" queue. For instance, with the io_uring driver, poll queues cause the kernel itself to poll for completions, avoiding repeated context switching while polling.
A limitation of poll queues is that the CPU thread is occupied with a single poll queue and cannot detect other events in the meantime such as network I/O or application events. Applications wishing to poll multiple things simultaneously may prefer to use application-level polling (see above).
Poll queue support is contingent on the particular driver and driver configuration being used. To determine whether a given blkio supports poll queues, check the "supports-poll-queues" property:
bool supports_poll_queues; ret = blkio_get_bool(b, "supports-poll-queues", &supports_poll_queues); if (ret != 0) ... if (!supports_poll_queues) { fprintf(stderr, "Poll queues not supported\n"); return; }
It is possible for poll queues not to support flush, write zeroes, and discard requests, even if "regular" queues of the same blkio do. However, read, write, readv, and writev requests are always supported. There is currently no mechanism to check which types of requests are supported by poll queues.
To use poll queues, set the "num-poll-queues" property to a positive value before calling blkio_start(), then use blkio_get_poll_queue() to retrieve the poll queues. A single blkio can have both "regular" queues and poll queues:
... ret = blkio_connect(b); if (ret != 0) ... ret = blkio_set_int(b, "num-queues", 1); ret = blkio_set_int(b, "num-poll-queues", 1); if (ret != 0) ... ret = blkio_start(b); if (ret != 0) ... struct blkioq *q = blkio_get_queue(b, 0); struct blkioq *poll_q = blkio_get_poll_queue(b, 0);
It is possible to set property "num-queues" to 0 as long as "num-poll-queues" is positive.
Poll queues also differ from "regular" queues in that they do not have a completion fd. blkioq_get_completion_fd() returns -1 when called on a poll queue, and blkioq_set_completion_fd_enabled() has no effect. Further, blkioq_do_io_interruptible() is not currently supported on poll queues.
Note that you can still perform application-level polling on poll queues by repeatedly calling blkioq_do_io() with min_completions = 0, but this will lead to suboptimal performance.
Some drivers have support for adding queues on demand after the blkio instance is already started:
int index = blkio_add_queue(b); /* or blkio_add_poll_queue() */ if (ret < 0) ... struct blkioq *q = blkio_get_queue(b, index); /* or blkio_get_poll_queue() */
The "can-add-queues" property determines whether this is supported. When it is, the blkio instance can be started with 0 queues.
In addition, all drivers allow explicitly removing queues, regardless of whether those queues were created by blkio_start() or blkio_add_queue() / blkio_add_poll_queue():
assert(blkio_get_queue(b, 0) != NULL); assert(blkio_get_queue(b, 1) != NULL); /* blkio_remove_queue() will return 0, indicating success */ assert(blkio_remove_queue(b, 0) == 0); /* Other queues' indices are not shifted, so q will be non-NULL and valid */ struct blkio *q = blkio_get_queue(b, 1); assert(q != NULL); /* blkio_remove_queue() will return -ENOENT, since queue 0 no longer exists */ assert(blkio_remove_queue(b, 0) == -ENOENT);
Once a queue is removed, any struct blkioq * pointing to it becomes invalid.
The following types of I/O requests are available:
void blkioq_read(struct blkioq *q, uint64_t start, void *buf, size_t len, void *user_data, uint32_t flags); void blkioq_write(struct blkioq *q, uint64_t start, void *buf, size_t len, void *user_data, uint32_t flags); void blkioq_readv(struct blkioq *q, uint64_t start, struct iovec *iovec, int iovcnt, void *user_data, uint32_t flags); void blkioq_writev(struct blkioq *q, uint64_t start, struct iovec *iovec, int iovcnt, void *user_data, uint32_t flags); void blkioq_write_zeroes(struct blkioq *q, uint64_t start, uint64_t len, void *user_data, uint32_t flags); void blkioq_discard(struct blkioq *q, uint64_t start, uint64_t len, void *user_data, uint32_t flags); void blkioq_flush(struct blkioq *q, void *user_data, uint32_t flags); void blkioq_report_zones( struct blkioq *q, uint64_t offset, struct blkio_zone *zones, uint32_t nr_zones, void *user_data, uint32_t flags); void blkioq_close_zone(struct blkioq *q, uint64_t offset, void *user_data, uint32_t flags); void blkioq_finish_zone(struct blkioq *q, uint64_t offset, void *user_data, uint32_t flags); void blkioq_open_zone(struct blkioq *q, uint64_t offset, void *user_data, uint32_t flags); void blkioq_reset_zone(struct blkioq *q, uint64_t offset, void *user_data, uint32_t flags); void blkioq_close_zone_all(struct blkioq *q, void *user_data, uint32_t flags); void blkioq_finish_zone_all(struct blkioq *q, void *user_data, uint32_t flags); void blkioq_open_zone_all(struct blkioq *q, void *user_data, uint32_t flags); void blkioq_reset_zone_all(struct blkioq *q, void *user_data, uint32_t flags);
The block device may see requests as soon as they these functions are called, but blkioq_do_io() must be called to ensure requests are seen.
If property "needs-mem-regions" is true, I/O data buffers pointed to by buf and iovec must be within regions mapped using blkio_map_mem_region().
The application must not free the iovec elements until the request's completion is returned by blkioq_do_io().
All drivers are guaranteed to support at least blkioq_read(), blkioq_write(), blkioq_readv(), blkioq_writev(), and blkioq_flush(). When attempting to queue a request that the driver does not support, the request itself fails and its completion's ret field is -ENOTSUP.
blkioq_read() and blkioq_readv() read data from the block device at byte offset start. blkioq_write() and blkioq_writev() write data to the block device at byte offset start. The length of the I/O data buffer is len bytes and the total size of the iovec elements, respectively. start and the length of the I/O data buffer must be a multiple of the "request-alignment" property. I/O data buffer addresses and lengths, including buf and individual iovec elements, must be multiples of the "buf-alignment" property.
blkioq_write_zeroes() causes zeros to be written to the specified region. When supported, this may be more efficient than using blkioq_write() with a zero-filled buffer.
blkioq_discard() causes data in the specified region to be discarded. Subsequent reads to the same region return unspecified data until it is written to again. Note that discarded data is not guaranteed to be erased and may still be returned by reads.
blkioq_flush() persists completed writes to the storage medium. Data is persistent once the flush request completes successfully. Applications that need to ensure that data persists across power failure or crash must submit flush requests at appropriate points.
blkioq_report_zones() allows the application to discover the zone organization of a zoned storage device. It writes the device zone information to the zone array which must be provided by the application. Currently implemented only for nvme-io_uring driver. Report zones requests are described in more detail further below.
blkioq_close_zone() transitions the zone to the BLKIO_ZONE_STATE_CLOSED state.
blkioq_finish_zone() transitions the zone to the BLKIO_ZONE_STATE_FULL state. The write pointer of the zone is moved to the end of the zone. No more write operations can be submitted to the zone until blkioq_reset_zone() or blkioq_reset_zone_all() is performed.
blkioq_open_zone() transitions the zone to the BLKIO_ZONE_STATE_EXP_OPEN state.
blkioq_reset_zone() resets the zone's write pointer to the beginning of the zone. All data previously written to the zone is lost. The zone is now in the BLKIO_ZONE_STATE_EMPTY state.
The offset argument identifies the number of the zone to perform the management request on. It is represented as the byte offset from the beginning of the device and is used in the management requests that operate only on one zone.
blkioq_close_zone_all() transitions all zones that are in the BLKIO_ZONE_STATE_IMP_OPEN state and BLKIO_ZONE_STATE_EXP_OPEN to the BLKIO_ZONE_STATE_CLOSED state.
blkioq_finish_zone_all() transitions all zones that are in the BLKIO_ZONE_STATE_IMP_OPEN, BLKIO_ZONE_STATE_EXP_OPEN and BLKIO_ZONE_STATE_CLOSED state to the BLKIO_ZONE_STATE_FULL state.
blkioq_open_zone_all() transitions all zones that are in the BLKIO_ZONE_STATE_CLOSED state to the BLKIO_ZONE_STATE_EXP_OPEN state.
blkioq_reset_zone_all() transitions all zones that are in the BLKIO_ZONE_STATE_IMP_OPEN, BLKIO_ZONE_STATE_EXP_OPEN, BLKIO_ZONE_STATE_CLOSED and BLKIO_ZONE_STATE_FULL state to the BLKIO_ZONE_STATE_EMPTY state.
The user_data pointer is returned in the struct blkio_completion::user_data field by blkioq_do_io(). It allows applications to correlate a completion with its request.
No ordering guarantees are defined for requests that are in flight simultaneously. For example, a flush request is not guaranteed to persist in-flight write requests. Instead the application must wait for write requests that it wishes to persist to complete before calling blkioq_flush().
Similarly, there are no ordering guarantees between multiple queues of a block device. Multi-threaded applications that rely on an ordering between multiple queues must wait for the first request to complete on one queue, synchronize threads as needed, and then submit the second request on the other queue.
The following request flags are available:
The offset argument is the offset in bytes that determines the zone to start the report from. When it is not multiple of zone size, it is rounded down to the beginning of the nearest zone.
The zones argument is a pointer to an array of blkio_zone structs. The application must not free the zones data buffer's elements until the request's completion is returned by blkioq_do_io().
The nr_zones argument is the number of zones requested by the application and the length of zones buffer.
Each zone is represented by struct blkio_zone:
struct blkio_zone { uint64_t start; uint64_t len; uint64_t capacity; uint64_t write_pointer; uint8_t zone_type; uint8_t zone_state; uint8_t reset; /* reserved space */ };
start, len, capacity and write_pointer are represented in bytes.
The start field is the byte offset where the zone begins. start value is relative to the start of the device.
The len field is the size of the zone. It can be larger than the size of usable memory and includes the size of unusable blocks, if they are present.
The capacity field indicates the size of usable memory within the zone. It is always smaller or equal to the zone size.
The write_pointer is the zone write pointer position. It shows the amount of space within the zone that has been used. write_pointer value is relative to the start of the device.
The zone_type is one amongst three zone types that are defined as follows:
All zone types accept random read operations.
The zone_state field contains the state of the zone variant which describes the usage of memory within the zone and resources of the device that this zone uses. The following zone states are defined:
The reset field is 1 when the application should perform RESET ZONE command and 0 otherwise.
The configuration of blkio instances is done through property accesses. Each property has a name and a type (bool, int, str, uint64). Properties may be read-only (r), write-only (w), or read/write (rw).
Access to properties depends on the blkio instance state (created/connected/started). A property may be read/write in the connected state but read-only in the started state. This is written as "rw connected, r started".
The following properties APIs are available:
int blkio_get_bool(struct blkio *b, const char *name, bool *value); int blkio_get_int(struct blkio *b, const char *name, int *value); int blkio_get_uint64(struct blkio *b, const char *name, uint64_t *value); int blkio_get_str(struct blkio *b, const char *name, char **value); int blkio_set_bool(struct blkio *b, const char *name, bool value); int blkio_set_int(struct blkio *b, const char *name, int value); int blkio_set_uint64(struct blkio *b, const char *name, uint64_t value); int blkio_set_str(struct blkio *b, const char *name, const char *value);
blkio_get_str() assigns to *value and the caller must use free(3) to deallocate the memory.
blkio_get_str() automatically converts to string representation if the property is not a str. blkio_set_str() automatically converts from string representation if the property is not a str. This can be used to easily fetch values from and store values to an application's text-based configuration file or command-line. Aside from this automatic conversion, the other property APIs fail with ENOTTY if the property does not have the right type.
The following properties are common across all drivers. Driver-specific properties are documented in DRIVERS.
DEVICE AND QUEUES
MEMORY REGIONS
ALL REQUESTS
READ AND WRITE REQUESTS
WRITE ZEROES REQUESTS
DISCARD REQUESTS
The io_uring driver uses the Linux io_uring system call interface to perform I/O on files and block device nodes. Both regular files and block device nodes are supported.
Note that io_uring was introduced in Linux kernel version 5.1, and kernels may also be configured to disable io_uring. If io_uring is not available, blkio_create() fails with -ENOSYS when using this driver.
When performing I/O on regular files, write zeroes requests that extend past the end-of-file may or may not update the file size. This is left unspecified and the user must not rely on any particular behavior.
This driver supports poll queues only when using O_DIRECT on block devices or file systems that support polling. Its poll queues never support flush, write zeroes, or discard requests.
Driver-specific properties available after blkio_create()
If this property is set, properties "direct" and "read-only" have no effect and it is the user's responsibility to open the file with the desired flags. Further, during connect, those two properties are updated to reflect the file status flags of the given file descriptor.
If this property is set, property "fd" must not be set and will be updated on connect to reflect the opened file descriptor. Note that the file descriptor is owned by libblkio.
Driver-specific properties available after blkio_connect()
A larger value allows more requests to be in flight, but consumes more resources. Tuning this value can affect performance.
io_uring imposes a maximum on this number: 32768 as of mainline kernel 5.18, and 4096 prior to 5.4. If this maximum is exceeded, blkio_start() will fail with -EINVAL.
The nvme-io_uring driver submits NVMe commands directly to an NVMe namespace using io_uring passthrough, which is available since mainline Linux kernel 5.19.
The process must have the CAP_SYS_ADMIN capability to use this driver, and the NVMe namespace must use the NVM command set.
Driver-specific properties available after blkio_create()
If this property is set, property "fd" must not be set and will be updated on connect to reflect the opened file descriptor. Note that the file descriptor is owned by libblkio.
Driver-specific properties available after blkio_connect()
A larger value allows more requests to be in flight, but consumes more resources. Tuning this value can affect performance.
io_uring imposes a maximum on this number: 32768 as of mainline kernel 5.18, and 4096 prior to 5.4. If this maximum is exceeded, blkio_start() will fail with -EINVAL.
When this number is reached, the application must reset or finish a currently active zone in order to free resources for further operations. This number only affects the ability to write zones and not the ability to read.
When this number is reached, the application must close, finish, or reset a currently open zone in order to free resources for further operations. This number only affects the ability to write zones and not the ability to read.
The following virtio-blk drivers are provided:
These drivers always support poll queues, and their poll queues support all types of requests.
The following properties apply to all these drivers with some exceptions described in the property.
Driver-specific properties available after blkio_create()
Driver-specific properties available after blkio_connect()
pkg-config is the recommended way to build a program with libblkio:
$ cc -o app app.c `pkg-config blkio --cflags --libs`
Meson projects can use pkg-config as follows:
blkio = dependency('blkio') executable('app', 'app.c', dependencies : [blkio])
Maybe. The API was designed with a synchronous control path. Functions like blkio_get_uint64() must return quickly. Operations on network storage can take an unbounded amount of time (in the absence of a timeout mechanism) and are not a good fit for synchronous APIs. A more complex asynchronous control path API could be added for applications wishing to use network storage drivers in the future.
Maybe. No attempt has been made to restrict the library to POSIX features only and most drivers are platform-specific. If there is demand for supporting other operating systems and developers willing to work on it then it may be possible.
Linux AIO could serve as a fallback on systems where io_uring is not available. However, io_submit(2) can block the process and this causes performance problems in event-driven applications that require that the event loop does not block. Unless Linux AIO is fixed it is unlikely that a proposal to add a driver will be accepted.
io_uring_setup(2), io_setup(2), aio(7)