string_copying(7) | Miscellaneous Information Manual | string_copying(7) |
stpcpy, strcpy, strcat, stpecpy, strtcpy, strlcpy, strlcat, stpncpy, strncpy, strncat - copying strings and character sequences
// Chain-copy a string. char *stpcpy(char *restrict dst, const char *restrict src);
// Copy/catenate a string. char *strcpy(char *restrict dst, const char *restrict src); char *strcat(char *restrict dst, const char *restrict src);
// Chain-copy a string with truncation. char *stpecpy(char *dst, char end[0], const char *restrict src);
// Copy/catenate a string with truncation. ssize_t strtcpy(char dst[restrict .dsize], const char *restrict src, size_t dsize); size_t strlcpy(char dst[restrict .dsize], const char *restrict src, size_t dsize); size_t strlcat(char dst[restrict .dsize], const char *restrict src, size_t dsize);
// Fill a fixed-size buffer with characters from a string // and pad with null bytes. char *strncpy(char dst[restrict .dsize], const char *restrict src, size_t dsize); char *stpncpy(char dst[restrict .dsize], const char *restrict src, size_t dsize);
// Chain-copy a null-padded character sequence into a character sequence. mempcpy(dst, src, strnlen(src, NITEMS(src)));
// Chain-copy a null-padded character sequence into a string. stpcpy(mempcpy(dst, src, strnlen(src, NITEMS(src))), "");
// Catenate a null-padded character sequence into a string. char *strncat(char *restrict dst, const char src[restrict .ssize], size_t ssize);
// Chain-copy a known-length character sequence. void *mempcpy(void dst[restrict .len], const void src[restrict .len], size_t len);
// Chain-copy a known-length character sequence into a string. stpcpy(mempcpy(dst, src, len), "");
Originally, there was a distinction between functions that copy and those that catenate. However, newer functions that copy while allowing chaining cover both use cases with a single API. They are also algorithmically faster, since they don't need to search for the terminating null character of the existing string. However, functions that catenate have a much simpler use, so if performance is not important, it can make sense to use them for improving readability.
The pointer returned by functions that allow chaining is a byproduct of the copy operation, so it has no performance costs. Functions that return such a pointer, and thus can be chained, have names of the form *stp*(), since it's common to name the pointer just p.
Chain-copying functions that truncate should accept a pointer to the end of the destination buffer, and have names of the form *stpe*(). This allows not having to recalculate the remaining size after each call.
The first thing to note is that programmers should be careful with buffers, so they always have the correct size, and truncation is not necessary.
In most cases, truncation is not desired, and it is simpler to just do the copy. Simpler code is safer code. Programming against programming mistakes by adding more code just adds more points where mistakes can be made.
Nowadays, compilers can detect most programmer errors with features like compiler warnings, static analyzers, and _FORTIFY_SOURCE (see ftm(7)). Keeping the code simple helps these overflow-detection features be more precise.
When validating user input, code should normally not truncate, but instead fail and prevent the copy at all.
In some cases, however, it makes sense to truncate.
Functions that truncate:
For historic reasons, some standard APIs and file formats, such as utmpx(5) and tar(1), use null-padded character sequences in fixed-size buffers. To interface with them, specialized functions need to be used.
To copy bytes from strings into these buffers, use strncpy(3) or stpncpy(3).
To read a null-padded character sequence, use strnlen(src, NITEMS(src)), and then you can treat it as a known-length character sequence; or use strncat(3) directly.
The simplest character sequence copying function is mempcpy(3). It requires always knowing the length of your character sequences, for which structures can be used. It makes the code much faster, since you always know the length of your character sequences, and can do the minimal copies and length measurements. mempcpy(3) copies character sequences, so you need to explicitly set the terminating null character if you need a string.
In programs that make considerable use of strings or character sequences, and need the best performance, using overlapping character sequences can make a big difference. It allows holding subsequences of a larger character sequence, while not duplicating memory nor using time to do a copy.
However, this is delicate, since it requires using character sequences. C library APIs use strings, so programs that use character sequences will have to take care of differentiating strings from character sequences.
To copy a known-length character sequence, use mempcpy(3).
To copy a known-length character sequence into a string, use stpcpy(mempcpy(dst, src, len), "").
A string is also accepted as input, because mempcpy(3) asks for the length, and a string is composed of a character sequence of the same length plus a terminating null character.
Some functions only operate on strings. Those require that the input src is a string, and guarantee an output string (even when truncation occurs). Functions that catenate also require that dst holds a string before the call. List of functions:
Other functions require an input string, but create a character sequence as output. These functions have confusing names, and have a long history of misuse. List of functions:
Other functions operate on an input character sequence, and create an output string. Functions that catenate also require that dst holds a string before the call. strncat(3) has an even more misleading name than the functions above. List of functions:
Other functions operate on an input character sequence to create an output character sequence. List of functions:
Most of these functions don't set errno.
The Linux kernel has an internal function for copying strings, strscpy(9), which is identical to strtcpy(), except that it returns -E2BIG instead of -1 and it doesn't set errno.
Don't mix chain calls to truncating and non-truncating functions. It is conceptually wrong unless you know that the first part of a copy will always fit. Anyway, the performance difference will probably be negligible, so it will probably be more clear if you use consistent semantics: either truncating or non-truncating. Calling a non-truncating function after a truncating one is necessarily wrong.
All catenation functions share the same performance problem: Shlemiel the painter. As a mitigation, compilers are able to transform some calls to catenation functions into normal copy functions, since strlen(dst) is usually a byproduct of the previous copy.
strlcpy(3) and strlcat(3) need to read the entire src string, even if the destination buffer is small. This makes them vulnerable to Denial of Service (DoS) attacks if an attacker can control the length of the src string. And if not, they're still unnecessarily slow.
The following are examples of correct use of each of these functions.
p = buf; p = stpcpy(p, "Hello "); p = stpcpy(p, "world"); p = stpcpy(p, "!"); len = p - buf; puts(buf);
strcpy(buf, "Hello "); strcat(buf, "world"); strcat(buf, "!"); len = strlen(buf); puts(buf);
end = buf + NITEMS(buf); p = buf; p = stpecpy(p, end, "Hello "); p = stpecpy(p, end, "world"); p = stpecpy(p, end, "!"); if (p == NULL) { len = NITEMS(buf) - 1; goto toolong; } len = p - buf; puts(buf);
len = strtcpy(buf, "Hello world!", NITEMS(buf)); if (len == -1) goto toolong; puts(buf);
if (strlcpy(buf, "Hello ", NITEMS(buf)) >= NITEMS(buf)) goto toolong; if (strlcat(buf, "world", NITEMS(buf)) >= NITEMS(buf)) goto toolong; len = strlcat(buf, "!", NITEMS(buf)); if (len >= NITEMS(buf)) goto toolong; puts(buf);
p = stpncpy(u->ut_user, "alx", NITEMS(u->ut_user)); if (NITEMS(u->ut_user) < strlen("alx")) goto toolong; len = p - u->ut_user; fwrite(u->ut_user, 1, len, stdout);
strncpy(u->ut_user, "alx", NITEMS(u->ut_user)); if (NITEMS(u->ut_user) < strlen("alx")) goto toolong; len = strnlen(u->ut_user, NITEMS(u->ut_user)); fwrite(u->ut_user, 1, len, stdout);
char buf[NITEMS(u->ut_user)]; p = buf; p = mempcpy(p, u->ut_user, strnlen(u->ut_user, NITEMS(u->ut_user))); len = p - buf; fwrite(buf, 1, len, stdout);
char buf[NITEMS(u->ut_user) + 1]; p = buf; p = mempcpy(p, u->ut_user, strnlen(u->ut_user, NITEMS(u->ut_user))); p = stpcpy(p, ""); len = p - buf; puts(buf);
char buf[NITEMS(u->ut_user) + 1]; strcpy(buf, ""); strncat(buf, u->ut_user, NITEMS(u->ut_user)); len = strlen(buf); puts(buf);
p = buf; p = mempcpy(p, "Hello ", 6); p = mempcpy(p, "world", 5); p = mempcpy(p, "!", 1); len = p - buf; fwrite(buf, 1, len, stdout);
p = buf; p = mempcpy(p, "Hello ", 6); p = mempcpy(p, "world", 5); p = mempcpy(p, "!", 1); p = stpcpy(p, ""); len = p - buf; puts(buf);
Here are reference implementations for functions not provided by libc.
/* This code is in the public domain. */ char * stpecpy(char *dst, char end[0], const char *restrict src) { size_t dlen; if (dst == NULL) return NULL; dlen = strtcpy(dst, src, end - dst); return (dlen == -1) ? NULL : dst + dlen; } ssize_t strtcpy(char *restrict dst, const char *restrict src, size_t dsize) { bool trunc; size_t dlen, slen; if (dsize == 0) { errno = ENOBUFS; return -1; } slen = strnlen(src, dsize); trunc = (slen == dsize); dlen = slen - trunc; stpcpy(mempcpy(dst, src, dlen), ""); if (trunc) errno = E2BIG; return trunc ? -1 : slen; }
bzero(3), memcpy(3), memccpy(3), mempcpy(3), stpcpy(3), strlcpy(3bsd), strncat(3), stpncpy(3), string(3)
2023-12-17 | Linux man-pages 6.7 |