User Guide#

Dependencies#

icupy >=0.11.0 (pre-built packages are available)
Note
icupy requirements:
- ICU4C (ICU - International Components for Unicode) - latest version recommended
- C++17 compatible compiler (see supported compilers)
- CMake >= 3.7

Installation#

Configuring environment variables for icupy (ICU):
- Windows:
  - Set the ICU_ROOT environment variable to the root of the ICU installation (default is “C:\icu”). For example, if the ICU is located in “C:\icu4c”:
    Command Prompt
    set ICU_ROOT=C:\icu4c
    PowerShell
    $env:ICU_ROOT = "C:\icu4c"
  - To verify settings using icuinfo:
    Command Prompt (64 bit)
    %ICU_ROOT%\bin64\icuinfo
    PowerShell (64 bit)
    & $env:ICU_ROOT\bin64\icuinfo
- Linux/POSIX:
  - If the ICU is located in a non-regular place, set the PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables. For example, if the ICU is located in “/usr/local”:
    export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
  - To verify settings using pkg-config:
    $ pkg-config --cflags --libs icu-uc -I/usr/local/include -L/usr/local/lib -licuuc -licudata
Installing from PyPI:
```
pip install urlstd
```

Basic Usage#

To parse a string into a URL:

>>> from urlstd.parse import URL
>>> URL('http://user:pass@foo:21/bar;par?b#c')
<URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>

To parse a string into a URL with using a base URL:

>>> url = URL('?ﬃ&🌈', base='http://example.org')
>>> url
<URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
>>> url.search
'?%EF%AC%83&%F0%9F%8C%88'
>>> params = url.search_params
>>> params
URLSearchParams([('ﬃ', ''), ('🌈', '')])
>>> params.sort()
>>> params
URLSearchParams([('🌈', ''), ('ﬃ', '')])
>>> url.search
'?%F0%9F%8C%88=&%EF%AC%83='
>>> str(url)
'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='

To validate a URL string:

>>> from urlstd.parse import URL, URLValidator, ValidityState
>>> URL.can_parse('https://user:password@example.org/')
True
>>> URLValidator.is_valid('https://user:password@example.org/')
False
>>> validity = ValidityState()
>>> URLValidator.is_valid('https://user:password@example.org/', validity=validity)
False
>>> validity.valid
False
>>> validity.validation_errors
1
>>> validity.descriptions[0]
"invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"

urlstd.parse.urlparse() is an alternative to urllib.parse.urlparse(). To parse a string into a urllib.parse.ParseResult with using a base URL:

>>> import html
>>> from urllib.parse import unquote
>>> from urlstd.parse import urlparse
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
>>> unquote(pr.query)
'aÿb'
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
>>> unquote(pr.query, encoding='windows-1251')
'a&#255;b'
>>> html.unescape('a&#255;b')
'aÿb'
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
>>> unquote(pr.query, encoding='windows-1252')
'aÿb'

Logging#

urlstd uses standard library logging for validation error. Change the logger log level of urlstd if needed:

logging.getLogger('urlstd').setLevel(logging.ERROR)