trurl(1) | 0.16 | trurl(1) |
trurl - transpose URLs
trurl [options / URLs]
trurl parses, manipulates and outputs URLs and parts of URLs.
It uses the RFC 3986 definition of URLs and it uses libcurl's URL parser to do so, which includes a few "extensions". The URL support is limited to "hierarchical" URLs, the ones that use :// separators after the scheme.
Typically you pass in one or more URLs and decide what of that you want output. Possibly modifying the URL as well.
trurl knows URLs and every URL consists of up to ten separate and independent components. These components can be extracted, removed and updated with trurl and they are referred to by their respective names: scheme, user, password, options, host, port, path, query, fragment and zoneid.
When provided a URL to work with, trurl "normalizes" it. It means that individual URL components are URL decoded then URL encoded back again and set in the URL.
Example:
$ trurl 'http://ex%61mple:80/%62ath/a/../b?%2e%FF#tes%74' http://example/bath/b?.%ff#test
Options start with one or two dashes. Many of the options require an additional value next to them.
Any other argument is interpreted as a URL argument, and is treated as if it was following a --url option.
The first argument that is exactly two dashes (--), marks the end of options; any argument after the end of options is interpreted as a URL argument even if it starts with a dash.
Long options can be provided either as --flag argument or as --flag=argument.
For path, this URL encodes and appends the new segment to the path, separated with a slash.
For query, this URL encodes and appends the new segment to the query, separated with an ampersand (&). If the appended segment contains an equal sign (=) that one is kept verbatim and both sides of the first occurrence are URL encoded separately.
According to RFC 3986, a space cannot legally be part of a URL. This option provides a best-effort to convert the provided string into a valid URL.
Note that trurl only knows default port numbers for URL schemes that are supported by libcurl.
Since, by default, trurl removes default port numbers from URLs with a known scheme, this option is pretty much ignored unless one of --get, --json, and --keep-port is not also specified.
Each line needs to be a single valid URL. trurl removes one carriage return character at the end of the line if present, trims off all the trailing space and tab characters, and skips all empty (after trimming) lines.
The maximum line length supported in a file like this is 4094 bytes. Lines that exceed that length are skipped, and a warning is printed to stderr when they are encountered.
The following component names are available (case sensitive): url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
{component} expands to nothing if the given component does not have a value.
Components are shown URL decoded by default.
URL decoding a component may cause problems to display it. Such problems make a warning get displayed unless --quiet is used.
trurl supports a range of different qualifiers, or prefixes, to the component that changes how it handles it:
If url: is specified, like {url:path}, the component gets output URL encoded. As a shortcut, url: also works written as a single colon: {:path}.
If strict: is specified, like {strict:path}, URL decode problems are turned into errors. In this stricter mode, a URL decode problem makes trurl stop what it is doing and return with exit code 10.
If must: is specified, like {must:query}, it makes trurl return an error if the requested component does not exist in the URL. By default a missing component will just be shown blank.
If default: is specified, like {default:url} or {default:port}, and the port is not explicitly specified in the URL, the scheme's default port is output if it is known.
If puny: is specified, like {puny:url} or {puny:host}, the punycoded version of the hostname is used in the output. This option is mutually exclusive with idn:.
If idn: is specified like {idn:url} or {idn:host}, the International Domain Name version of the hostname is used in the output if it is provided as a correctly encoded punycode version. This option is mutually exclusive with puny:.
If --default-port is specified, all formats are expanded as if they used default:; and if --punycode is used, all formats are expanded as if they used puny:. Also note that {url} is affected by the --keep-port option.
Hosts provided as IPv6 numerical addresses are provided within square brackets. Like [fe80::20c:29ff:fe9c:409b].
Hosts provided as IPv4 numerical addresses are normalized and provided as four dot-separated decimal numbers when output.
You can access specific keys in the query string using the format {query:key}. Then the value of the first matching key is output using a case sensitive match. When extracting a URL decoded query key that contains %00, such octet is replaced with a single period . in the output.
You can access specific keys in the query string and out all values using the format {query-all:key}. This looks for key case sensitively and outputs all values for that key space-separated.
The format string supports the following backslash sequences:
\ - backslash
\t - tab
\n - newline
\r - carriage return
\{ - an open curly brace that does not start a variable
\[ - an open bracket that does not start a variable
All other text in the format string is shown as-is.
Example:
$ trurl example.com --iterate=scheme="ftp https" --iterate=port="22 80" ftp://example.com:22/ ftp://example.com:80/ https://example.com:22/ https://example.com:80/
The URL components are provided URL decoded. Change that with --urlencode.
Example:
$ trurl https://example.com:443/ --keep-port https://example.com:443/
Example:
$ trurl example.com --no-guess-scheme trurl note: Bad scheme [example.com]
Example:
$ trurl http://åäö/ --punycode http://xn--4cab6c/
what is specified as a full name of a name/value pair, or as a word prefix (using a single trailing asterisk (*)) which makes trurl remove the tuples from the query string that match the instruction.
To match a literal trailing asterisk instead of using a wildcard, escape it with a backslash in front of it. Like \*.
Example:
$ trurl "https://curl.se?b=name:a=age" --sort-query --query-separator ":" https://curl.se/?a=age:b=name
Example:
$ trurl --url https://curl.se/we/are.html --redirect ../here.html https://curl.se/here.html
data can either take the form of a single value, or as a key/value pair in the shape foo=bar. If replace is called on an item that is not in the list of queries trurl ignores that item.
trurl URL encodes both sides of the = character in the given input data argument.
The following components can be set: url, scheme, user, password, options, host, port, path, query, fragment and zoneid.
If a simple =-assignment is used, the data is URL encoded when applied. If := is used, the data is assumed to already be URL encoded and stored as-is.
If ?= is used, the set is only performed if the component is not already set. It avoids overwriting any already set data.
You can also combine : and ? into ?:= if desired.
If no URL or --url-file argument is provided, trurl tries to create a URL using the components provided by the --set options. If not enough components are specified, this fails.
Trims data off a component. Currently this can only trim a query component.
what is specified as a full word or as a word prefix (using a single trailing asterisk (*)) which makes trurl remove the tuples from the query string that match the instruction.
To match a literal trailing asterisk instead of using a wildcard, escape it with a backslash in front of it. Like \*.
Providing multiple URLs makes trurl act on all URLs in a serial fashion.
If the URL cannot be parsed for whatever reason, trurl simply moves on to the next provided URL - unless --verify is used.
A URL cannot exist without a scheme, but unless --no-guess-scheme is used trurl guesses what scheme that was intended if none was provided.
Examples:
$ trurl https://odd/ -g '{scheme}' https
$ trurl odd -g '{scheme}' http
$ trurl odd -g '{scheme}' --no-guess-scheme trurl note: Bad scheme [odd]
Example:
$ trurl https://user%3a%40:secret@odd/ -g '{user}' user:@
Example:
$ trurl https://user:secr%65t@odd/ -g '{password}' secret
$ trurl 'imap://user:pwd;giraffe@odd' -g '{options}' giraffe
If the scheme is not IMAP, the giraffe part is instead considered part of the password:
$ trurl 'sftp://user:pwd;giraffe@odd' -g '{password}' pwd;giraffe
We strongly advice users to %-encode ;, : and @ in URLs of course to reduce the risk for confusions.
trurl provides options for working with the IDN hostnames either as IDN or in its punycode version.
Example, convert an IDN name to punycode in the output:
$ trurl http://åäö/ --punycode http://xn--4cab6c/
Or the reverse, convert a punycode hostname into its IDN version:
$ trurl http://xn--4cab6c/ --as-idn http://åäö/
If the URL's hostname starts with an open bracket ([) it is a numerical IPv6 address that also must end with a closing bracket (]). trurl normalizes IPv6 addreses.
Example:
$ trurl 'http://[2001:9b1:0:0:0:0:7b97:364b]/' http://[2001:9b1::7b97:364b]/
A numerical IPV4 address can be specified using one, two, three or four numbers separated with dots and they can use decimal, octal or hexadecimal. trurl normalizes provided addresses and uses four dotted decimal numbers in its output.
Examples:
$ trurl http://646464646/ http://38.136.68.134/
$ trurl http://246.646/ http://246.0.2.134/
$ trurl http://246.46.646/ http://246.46.2.134/
$ trurl http://0x14.0xb3022/ http://20.11.48.34/
Example:
$ trurl 'http://[2001:9b1::f358:1ba4:7b97:364b%enp3s0]/' -g '{zoneid}' enp3s0
trurl knows the default port number for many URL schemes so it can show port numbers for a URL even if none was explicitly used in the URL. With --default-port it can add the default port to a URL even when not provide.
Example:
$ trurl http:/a --default-port http://a:80/
Similarly, trurl normally hides the port number if the given number is the default.
Example:
$ trurl http:/a:80 http://a/
But a user can make trurl keep the port even if it is the default, with --keep-port.
Example:
$ trurl http:/a:80 --keep-port http://a:80/
Example:
$ trurl http://xn--4cab6c -g '[path]' /
When setting the path, trurl will inject a leading slash if none is provided:
$ trurl http://hello -s path="pony" http://hello/pony
$ trurl http://hello -s path="/pony" http://hello/pony
If the input path contains dotdot or dot-slash sequences, they are normalized away.
Example:
$ trurl http://hej/one/../two/../three/./four http://hej/three/four
You can append a new segment to an existing path with --append like this:
$ trurl http://twelve/three?hello --append path=four http://twelve/three/four?hello
Example:
$ trurl http://horse?elephant -g '{query}' elephant
Example, if you set the query with a leading question mark:
$ trurl http://horse?elephant -s "query=?elephant" http://horse/?%3felephant
Query parts are often made up of a series of name=value pairs separated with ampersands (&), and trurl offers several ways to work with such.
Append a new name value pair to a URL with --append:
$ trurl http://host?name=hello --append query=search=life http://host/?name=hello&search=life
You cam --replace the value of a specific existing name among the pairs:
$ trurl 'http://alpha?one=real&two=fake' --replace two=alsoreal http://alpha/?one=real&two=alsoreal
If the specific name you want to replace perhaps does not exist in the URL, you can opt to replace or append the pair:
$ trurl 'http://alpha?one=real&two=fake' --replace-append three=alsoreal http://alpha/?one=real&two=fake&three=alsoreal
In order to perhaps compare two URLs using query name value pairs, sorting them first at least increases the chances of it working:
$ trurl "http://alpha/?one=real&two=fake&three=alsoreal" --sort-query http://alpha/?one=real&three=alsoreal&two=fake
Remove name/value pairs from the URL by specifying exact name or wildcard pattern with --qtrim:
$ trurl 'https://example.com?a12=hej&a23=moo&b12=foo' --qtrim a*' https://example.com/?b12=foo
Example:
$ trurl http://horse#elephant -g '{fragment}' elephant
Example, if you set the fragment with a leading hash sign:
$ trurl "http://horse#elephant" -s "fragment=#zebra" http://horse/#%23zebra
The fragment part of a URL is for local purposes only. The data in there is never actually sent over the network when a URL is used for transfers.
Example:
$ trurl ftps://example.com:2021/p%61th -g '{url}' ftps://example.com:2021/path
The --json option outputs a JSON array with one or more objects. One for each URL. Each URL JSON object contains a number of properties, a series of key/value pairs. The exact set present depends on the given URL.
The key/values are extracted from the query where they are separated by ampersands (&) - or the user sets with --query-separator.
The query pairs are listed in the order of appearance in a left-to-right order, but can be made alpha-sorted with --sort-query.
It is only present if the URL has a query.
$ trurl --url https://curl.se --set host=example.com https://example.com/
$ trurl --set host=example.com --set scheme=ftp ftp://example.com/
$ trurl --url https://curl.se/we/are.html --redirect here.html https://curl.se/we/here.html
$ trurl --url https://curl.se/we/../are.html --set port=8080 https://curl.se:8080/are.html
$ trurl --url https://curl.se/we/are.html --get '{path}' /we/are.html
$ trurl --url https://curl.se/we/are.html --get '{default:port}' 443
$ trurl --url https://curl.se/hello --append path=you https://curl.se/hello/you
$ trurl --url "https://curl.se?name=hello" --append query=search=string https://curl.se/?name=hello&search=string
$ cat urllist.txt | trurl --url-file - \&...
$ trurl "https://fake.host/search?q=answers&user=me#frag" --json [ { "url": "https://fake.host/search?q=answers&user=me#frag", "parts": [ "scheme": "https", "host": "fake.host", "path": "/search", "query": "q=answers&user=me" "fragment": "frag", ], "params": [ { "key": "q", "value": "answers" }, { "key": "user", "value": "me" } ] } ]
$ trurl "https://curl.se?search=hey&utm_source=tracker" --qtrim "utm_*" https://curl.se/?search=hey
$ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}' home
$ trurl "https://example.com?b=a&c=b&a=c" --sort-query https://example.com?a=c&b=a&c=b
$ trurl "https://curl.se?search=fool;page=5" --qtrim "search" --query-separator ";" https://curl.se?page=5
$ trurl "https://curl.se/this has space/index.html" --accept-space https://curl.se/this%20has%20space/index.html
$ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp" http://curl.se/path/index.html ftp://curl.se/path/index.html sftp://curl.se/path/index.html
trurl returns a non-zero exit code to indicate problems.
https://curl.se/trurl
curl(1), wcurl(1)
2024-09-19 | trurl |