CONFUSABLE_HOMOGLYPHS(1) | confusable_homoglyphs | CONFUSABLE_HOMOGLYPHS(1) |
confusable_homoglyphs - confusable_homoglyphs Documentation
Contents:
This project has been adopted from the original confusable_homoglyphs by Victor Felder.
a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar wikipedia:Homoglyph
Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to be impersonated by a trickster who deliberately chose the username ΑlaskaJazz.
You might also want to avoid people being tricked into entering their password on www.microsоft.com or www.faϲebook.com instead of www.microsoft.com or www.facebook.com. Here is a utility to play with these confusable homoglyphs.
Not all mixed-script strings have to be ruled out though, you could only exclude mixed-script strings containing characters that might be confused with a character from some unicode blocks of your choosing.
This library is compatible with Python 3.
Yep.
The unicode blocks aliases and names for each character are extracted from this file provided by the unicode consortium.
The matrix of which character can be confused with which other characters is built using this file provided by the unicode consortium.
This data is stored in two JSON files: categories.json and confusables.json. If you delete them, they will both be recreated by downloading and parsing the two abovementioned files and stored as JSON files again.
If available, install an appropriate package from your distribution:
Otherwise you can install from PyPi:
at the command line:
$ easy_install confusable_homoglyphs
or, if you have virtualenvwrapper installed:
$ mkvirtualenv confusable_homoglyphs $ pip install confusable_homoglyphs
To use confusable_homoglyphs in a project:
pip install confusable_homoglyphs import confusable_homoglyphs
To update the data files, you first need to install the “cli” bundle, then run the “update” command:
pip install confusable_homoglyphs[cli] confusable_homoglyphs update
>>> categories.alias('A') 'LATIN' >>> categories.alias('τ') 'GREEK' >>> categories.alias('-') 'COMMON'
>>> categories.aliases_categories('A') ('LATIN', 'L') >>> categories.aliases_categories('τ') ('GREEK', 'L') >>> categories.aliases_categories('-') ('COMMON', 'Pd')
>>> categories.category('A') 'L' >>> categories.category('τ') 'L' >>> categories.category('-') 'Pd'
>>> categories.unique_aliases('ABC') {'LATIN'} >>> categories.unique_aliases('ρAτ-') {'GREEK', 'LATIN', 'COMMON'}
If greedy=False, it will only return the first confusable character found without looking at the rest of the string, greedy=True returns all of them.
preferred_aliases=[] can take an array of unicode block aliases to be considered as your ‘base’ unicode blocks:
>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character'] 'ρ' >>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character'] 'p' >>> confusables.is_confusable('Abç', preferred_aliases=['latin']) False >>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin']) False >>> confusables.is_confusable('ρττ', preferred_aliases=['greek']) False >>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common']) False >>> confusables.is_confusable('ρττp') [{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
For preferred_aliases examples, see is_confusable docstring.
>>> bool(confusables.is_dangerous('Allo')) False >>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin'])) False >>> bool(confusables.is_dangerous('Alloρ')) True >>> bool(confusables.is_dangerous('AlaskaJazz')) False >>> bool(confusables.is_dangerous('ΑlaskaJazz')) True
E.g. B. C is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.
>>> confusables.is_mixed_script('Abç') False >>> confusables.is_mixed_script('ρτ.τ') False >>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[]) True >>> confusables.is_mixed_script('Alloτ') True
This is the package directory by default, or the env variable CONFUSABLE_DATA if set.
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Report bugs at https://todo.sr.ht/~valhalla/confusable_homoglyphs
If you are reporting a bug, please include:
Look through the sourcehut tickets for bugs. Anything tagged with “bug” is open to whoever wants to implement it.
Look through the sourcehut tickets for features. Anything tagged with “feature” is open to whoever wants to implement it.
confusable_homoglyphs could always use more documentation, whether as part of the official confusable_homoglyphs docs, in docstrings, or even on the web in blog posts, articles, and such.
The best way to send feedback is to file an issue at https://todo.sr.ht/~valhalla/confusable_homoglyphs.
If you are proposing a feature:
Ready to contribute? Here’s how to set up confusable_homoglyphs for local development.
$ git clone https://git.sr.ht/~valhalla/confusable_homoglyphs
$ mkvirtualenv confusable_homoglyphs $ cd confusable_homoglyphs/ $ python setup.py develop
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
$ flake8 confusable_homoglyphs tests $ python setup.py test $ tox
To get flake8 and tox, just pip install them into your virtualenv.
$ git add . $ git commit -m "Your detailed description of your changes."
$ git send-email \ --to="mailto:~valhalla/confusable_homoglyphs-devel@lists.sr.ht" \ HEAD^
you can see https://git-send-email.io/ for details on how to install and configure git-send-email.
Before you submit a pull request, check that it meets these guidelines:
Initial release.
Courtesy of Ryan P Kilby, via https://github.com/vhf/confusable_homoglyphs/pull/6 :
Victor Felder
2024, Victor Felder
January 30, 2024 | 3.3.1 |