trurl(1) | trurl Manual | trurl(1) |
trurl - transpose URLs
trurl [options / URLs]
trurl parses, manipulates and outputs URLs and parts of URLs.
It uses the RFC 3986 definition of URLs and it uses libcurl's URL parser to do so, which includes a few "extensions". The URL support is limited to "hierarchical" URLs, the ones that use "://" separators after the scheme.
Typically you pass in one or more URLs and decide what of that you want output. Possibly modifying the URL as well.
trurl knows URLs and every URL consists of up to ten separate and independent "components". These components can be extracted, removed and updated with trurl and they are referred to by their respective names: scheme, user, password, options, host, port, path, query, fragment and zoneid.
Options start with one or two dashes. Many of the options require an additional value next to them.
Any other argument is interpreted as a URL argument, and is treated as if it was following a --url option.
The first argument that is exactly two dashes ("--"), marks the end of options; any argument after the end of options is interpreted as a URL argument even if it starts with a dash.
For path, this URL encodes and appends the new segment to the path, separated with a slash.
For query, this URL encodes and appends the new segment to the query, separated with an ampersand (&). If the appended segment contains an equal sign ('=') that one will be kept verbatim and both sides of the first occurrence will be URL encoded separately.
According to RFC 3986, a space cannot legally be part of a URL. This option provides a best-effort to convert the provided string into a valid URL.
Note that trurl only knows default port numbers for URL schemes that are supported by libcurl.
Since, by default, trurl removes default port numbers from URLs with a known scheme, this option is pretty much ignored unless one of --get, --json, and --keep-port is not also specified.
Each line needs to be a single valid URL. trurl will remove one carriage return character at the end of the line if present, trim off all the trailing space and tab characters, and skip all empty (after trimming) lines.
The maximum line length supported in a file like this is 4094 bytes. Lines that exceed that length are skipped, and a warning is printed to stderr when they are encountered.
The following component names are available (case sensitive): url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
{component} will expand to nothing if the given component does not have a value.
Components are shown URL decoded by default. If you instead write the component prefixed with a colon like "{:path}", it gets output URL encoded.
You may also prefix components with default: and/or puny: or idn:, in any order.
If default: is specified, like "{default:url}" or "{default:port}", and the port is not explicitly specified in the URL, the scheme's default port will be output if it is known.
If puny: is specified, like "{puny:url}" or "{puny:host}", the "punycoded" version of the host name will be used in the output. This option is mutually exclusive with idn:.
If idn: is specified like "{idn:url}" or "{idn:host}", the International Domain Name version of the host name will be used in the output if it is provided as a correctly encoded punycode version. This option is mutually exclusive with puny:.
If --default-port is specified, all formats are expanded as if they used default:; and if --punycode is used, all formats are expanded as if they used puny:. Also note that "{url}" is affected by the --keep-port option.
Hosts provided as IPv6 numerical addresses will be provided within square brackets. Like "[fe80::20c:29ff:fe9c:409b]".
Hosts provided as IPv4 numerical addresses will be "normalized" and provided as four dot-separated decimal numbers when output.
You can access specific keys in the query string using the format {query:key}. Then the value of the first matching key will be output using a case sensitive match. When extracting a URL decoded query key that contains %00, such octet will be replaced with a single period '.' in the output.
You can access specific keys in the query string and out all values using the format {query-all:key}. This looks for 'key' case sensitively and will output all values for that key space-separated.
The "format" string supports the following backslash sequences:
\\ - backslash
\t - tab
\n - newline
\r - carriage return
\{ - an open curly brace that does not start a variable
\[ - an open bracket that does not start a variable
All other text in the format string will be shown as-is.
The following components can be set: url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
If a simple "="-assignment is used, the data is URL encoded when applied. If ":=" is used, the data is assumed to already be URL encoded and will be stored as-is.
If no URL or --url-file argument is provided, trurl will try to create a URL using the components provided by the --set options. If not enough components are specified, this will fail.
Providing multiple URLs will make trurl act on all URLs in a serial fashion.
If the URL cannot be parsed for whatever reason, trurl will simply move on to the next provided URL - unless --verify is used.
"what" is specified as a full word or as a word prefix (using a single trailing asterisk ('*')) which makes trurl remove the tuples from the query string that match the instruction.
To match a literal trailing asterisk instead of using a wildcard, escape it with a backslash in front of it. Like "\*".
The --json option outputs a JSON array with one or more objects. One for each URL.
Each URL JSON object contains a number of properties, a series of key/value pairs. The exact set depends on the given URL.
The key/values are extracted from the query where they are separated by ampersands (&) - or the user sets with --query-separator.
The query pairs are listed in the order of appearance in a left-to-right order, but can be made alpha-sorted with --sort-query.
It is only present if the URL has a query.
$ trurl --url https://curl.se --set host=example.com https://example.com/
$ trurl --set host=example.com --set scheme=ftp ftp://example.com/
$ trurl --url https://curl.se/we/are.html --redirect here.html https://curl.se/we/here.html
$ trurl --url https://curl.se/we/../are.html --set port=8080 https://curl.se:8080/are.html
$ trurl --url https://curl.se/we/are.html --get '{path}' /we/are.html
$ trurl --url https://curl.se/we/are.html --get '{default:port}' 443
$ trurl --url https://curl.se/hello --append path=you https://curl.se/hello/you
$ trurl --url "https://curl.se?name=hello" --append query=search=string https://curl.se/?name=hello&search=string
$ cat urllist.txt | trurl --url-file - ...
$ trurl "https://fake.host/search?q=answers&user=me#frag" --json [ { "url": "https://fake.host/search?q=answers&user=me#frag", "parts": [ "scheme": "https", "host": "fake.host", "path": "/search", "query": "q=answers&user=me" "fragment": "frag", ], "params": [ { "key": "q", "value": "answers" }, { "key": "user", "value": "me" } ] } ]
$ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*" https://curl.se/?search=hey
$ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}' home
$ trurl "https://example.com?b=a&c=b&a=c" --sort-query https://example.com?a=c&b=a&c=b
$ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";" https://curl.se?page=5
$ trurl "https://curl.se/this has space/index.html" --accept-space https://curl.se/this%20has%20space/index.html
$ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp" http://curl.se/path/index.html ftp://curl.se/path/index.html sftp://curl.se/path/index.html
https://curl.se/trurl
curl_url_set(3), curl_url_get(3)
April 27, 2023 | trurl |