User Guide#

Dependencies#

Installation#

  1. Configuring environment variables for icupy (ICU):

    • Windows:

      • Set the ICU_ROOT environment variable to the root of the ICU installation (default is “C:\icu”). For example, if the ICU is located in “C:\icu4c”:

        set ICU_ROOT=C:\icu4c
        
        $env:ICU_ROOT = "C:\icu4c"
        
      • To verify settings using icuinfo:

        %ICU_ROOT%\bin64\icuinfo
        
        & $env:ICU_ROOT\bin64\icuinfo
        
    • Linux/POSIX:

      • If the ICU is located in a non-regular place, set the PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables. For example, if the ICU is located in “/usr/local”:

        export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
        export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
        
      • To verify settings using pkg-config:

        $ pkg-config --cflags --libs icu-uc
        -I/usr/local/include -L/usr/local/lib -licuuc -licudata
        
  2. Installing from PyPI:

    pip install urlstd
    

Basic Usage#

To parse a string into a URL:

>>> from urlstd.parse import URL
>>> URL('http://user:pass@foo:21/bar;par?b#c')
<URL(href='http://user:pass@foo:21/bar;par?b#c', origin='http://foo:21', protocol='http:', username='user', password='pass', host='foo:21', hostname='foo', port='21', pathname='/bar;par', search='?b', hash='#c')>

To parse a string into a URL with using a base URL:

>>> url = URL('?ffi&🌈', base='http://example.org')
>>> url
<URL(href='http://example.org/?%EF%AC%83&%F0%9F%8C%88', origin='http://example.org', protocol='http:', username='', password='', host='example.org', hostname='example.org', port='', pathname='/', search='?%EF%AC%83&%F0%9F%8C%88', hash='')>
>>> url.search
'?%EF%AC%83&%F0%9F%8C%88'
>>> params = url.search_params
>>> params
URLSearchParams([('ffi', ''), ('🌈', '')])
>>> params.sort()
>>> params
URLSearchParams([('🌈', ''), ('ffi', '')])
>>> url.search
'?%F0%9F%8C%88=&%EF%AC%83='
>>> str(url)
'http://example.org/?%F0%9F%8C%88=&%EF%AC%83='

To validate a URL string:

>>> from urlstd.parse import URL, URLValidator, ValidityState
>>> URL.can_parse('https://user:password@example.org/')
True
>>> URLValidator.is_valid('https://user:password@example.org/')
False
>>> validity = ValidityState()
>>> URLValidator.is_valid('https://user:password@example.org/', validity=validity)
False
>>> validity.valid
False
>>> validity.validation_errors
1
>>> validity.descriptions[0]
"invalid-credentials: input includes credentials: 'https://user:password@example.org/' at position 21"

urlstd.parse.urlparse() is an alternative to urllib.parse.urlparse(). To parse a string into a urllib.parse.ParseResult with using a base URL:

>>> import html
>>> from urllib.parse import unquote
>>> from urlstd.parse import urlparse
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='utf-8')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%C3%BFb', fragment='')
>>> unquote(pr.query)
'aÿb'
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1251')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%26%23255%3Bb', fragment='')
>>> unquote(pr.query, encoding='windows-1251')
'a&#255;b'
>>> html.unescape('a&#255;b')
'aÿb'
>>> pr = urlparse('?aÿb', base='http://example.org/foo/', encoding='windows-1252')
>>> pr
ParseResult(scheme='http', netloc='example.org', path='/foo/', params='', query='a%FFb', fragment='')
>>> unquote(pr.query, encoding='windows-1252')
'aÿb'

Logging#

urlstd uses standard library logging for validation error. Change the logger log level of urlstd if needed:

logging.getLogger('urlstd').setLevel(logging.ERROR)