jesopo
|
5ef2b7af27
|
'str.split' -> 's.split'
|
2019-09-09 14:53:11 +01:00 |
|
jesopo
|
1df82c1cb2
|
still default to iso-latin-1 if no on-page or in-header content-type is present
|
2019-09-09 14:48:26 +01:00 |
|
jesopo
|
0a67659637
|
only look for <meta>-related tags when there are meta tags
|
2019-09-09 14:39:19 +01:00 |
|
jesopo
|
0a1077c5cd
|
add explicit None return for _find_encoding (mypy)
|
2019-09-09 14:25:01 +01:00 |
|
jesopo
|
ff9c82bf67
|
change utils.http.request to best-effort detect on-page encoding
closes #113
|
2019-09-09 14:11:18 +01:00 |
|
jesopo
|
397cfa8e7e
|
correctly qualify DeadlineExceededException namespace
|
2019-09-03 14:54:59 +01:00 |
|
jesopo
|
b7b2f31c1c
|
use utils.deadline() in utils.http.request, not raw sigalrm
|
2019-09-02 15:50:21 +01:00 |
|
jesopo
|
9cc1ee98eb
|
Pass the content of a webpage to HTTPParsingException
|
2019-09-02 13:27:44 +01:00 |
|
jesopo
|
408b89aeb7
|
use \S+ for url regex (for non-ascii chars), use url_sanitize to catch <>
|
2019-09-02 13:25:48 +01:00 |
|
jesopo
|
20042edfd9
|
Allow bypass of content-type check in utils.http.request
|
2019-08-05 15:41:02 +01:00 |
|
jesopo
|
d093027431
|
not all HTTP responses have content-type
|
2019-08-02 17:33:16 +01:00 |
|
jesopo
|
c19c6c0e14
|
asyncio.gather -> asyncio.wait (with timeout)
|
2019-07-08 14:50:11 +01:00 |
|
jesopo
|
469c725675
|
tell asyncio.gather which loop to use
|
2019-07-08 14:41:12 +01:00 |
|
jesopo
|
a1438abf66
|
close event loop when we're done with it (request_many())
|
2019-07-08 13:59:48 +01:00 |
|
jesopo
|
81c7af8ab5
|
Don't try/except async http exceptions
|
2019-07-08 13:51:02 +01:00 |
|
jesopo
|
ee0ec0eca1
|
switch request_many() to use asyncio.gather
|
2019-07-08 13:46:27 +01:00 |
|
jesopo
|
b62ba469d7
|
catch async exceptions in utils.http.request_many()
|
2019-07-08 13:18:59 +01:00 |
|
jesopo
|
078681eddf
|
add missing schema in utils.http.sanitise_url, use in rss.py
|
2019-07-08 12:54:06 +01:00 |
|
jesopo
|
ecb8364d0d
|
switch to using asyncio's event loop
|
2019-07-08 12:45:10 +01:00 |
|
jesopo
|
15e143fcff
|
implement utils.http.request_many as a tonado ioloop yield
|
2019-07-08 11:43:09 +01:00 |
|
jesopo
|
637067c62c
|
url_validate() -> url_sanitise()
|
2019-07-02 14:15:49 +01:00 |
|
jesopo
|
534854127b
|
Add utils.http.url_validate() for best-effort url tidying
|
2019-07-02 14:10:18 +01:00 |
|
jesopo
|
f9eb017466
|
message arg for HTTPWrongContentTypeException/HTTPParsingException
|
2019-06-28 23:01:21 +01:00 |
|
jesopo
|
97810db8df
|
Give descriptions to utils.http.HTTPException subclasses
|
2019-06-27 18:28:08 +01:00 |
|
jesopo
|
16d331dd43
|
add allow_redirects kwarg to utils.http.request()
|
2019-06-26 17:53:16 +01:00 |
|
jesopo
|
a802e66dcf
|
Defer decoding http payload bytestring until after checking ContentType
|
2019-06-04 13:47:03 +01:00 |
|
jesopo
|
0be9046669
|
Pass str object to BeautifulSoup, not bytes. closes #56
|
2019-05-28 10:22:35 +01:00 |
|
Patrick Nappa
|
2c344c9ddd
|
forgot the beautiful %
|
2019-05-03 13:50:51 +10:00 |
|
Patrick Nappa
|
471c11e229
|
ensure that non-url characters not separated by whitespace aren't consumed
|
2019-05-03 13:43:08 +10:00 |
|
jesopo
|
bdcb4b5db2
|
Add missing ":"
|
2019-04-25 17:50:41 +01:00 |
|
jesopo
|
1240b154cb
|
Support interfaces that don't have AF_INET and/or AF_INET6
|
2019-04-25 17:48:51 +01:00 |
|
jesopo
|
7643a962bd
|
Refuse to get the title for any url that points locall
|
2019-04-25 15:58:58 +01:00 |
|
jesopo
|
dffee4d223
|
Move REGEX_URL out of isgd.py and title.py in to utils.http
|
2019-04-24 15:46:54 +01:00 |
|
jesopo
|
197ae2e053
|
Raise a specific exception in utils.http.request for "wrong content type"
|
2019-02-28 23:28:45 +00:00 |
|
jesopo
|
846b881e52
|
Throw ValueError when utils.http.request tries to soup non-html/xml data
|
2019-02-27 15:16:08 +00:00 |
|
jesopo
|
cfaf6864fc
|
Don't try to parse non-html/xml stuff with BeautifulSoup
|
2019-02-26 11:18:50 +00:00 |
|
jesopo
|
2d3bb2b5e8
|
Typo in utils.http.request, 'response_heders' -> 'response_headers'
|
2018-12-11 22:31:14 +00:00 |
|
jesopo
|
5b59740043
|
Pass a dict to utils.CaseInsensitiveDict, not a MutableMapping
|
2018-12-11 22:30:57 +00:00 |
|
jesopo
|
d373edfaae
|
Add missing utils import in utils.http
|
2018-12-11 22:30:05 +00:00 |
|
jesopo
|
793d234a0b
|
'utils.http.get_url' -> 'utils.http.request', return a Response object from
utils.http.request
|
2018-12-11 22:26:38 +00:00 |
|
jesopo
|
b543e31cd2
|
Fix/refactor issues brought up by type hint linting
|
2018-10-30 17:49:35 +00:00 |
|
jesopo
|
e07553c362
|
Add type/return hints throughout src/ and, in doing so, fix some cyclical
references.
|
2018-10-30 14:58:48 +00:00 |
|
jesopo
|
d3231e3282
|
signal.signal timer callback takes 2 args
|
2018-10-25 14:09:19 +01:00 |
|
jesopo
|
c655668bbe
|
Add fallback_encoding to utils.http.get_url, in case a page has no implicit
encoding
|
2018-10-10 23:49:59 +01:00 |
|
jesopo
|
f286f3bf48
|
.decode data prior to json.loads in utils.http.get_url
|
2018-10-10 15:25:08 +01:00 |
|
jesopo
|
951c315cec
|
Fix syntax error for throwing a timeout when signal.alarm fires
|
2018-10-10 15:07:04 +01:00 |
|
jesopo
|
015fa8ddff
|
.decode plaintext returns from utils.http.get_url
|
2018-10-10 15:06:30 +01:00 |
|
jesopo
|
5b9ffe013d
|
Use signal.alarm to Deadline utils.http.get_url and throw useful exceptions
|
2018-10-10 14:25:44 +01:00 |
|
jesopo
|
be75f72356
|
Set a max size of 100mb for utils.http.get_url
|
2018-10-10 14:05:15 +01:00 |
|
jesopo
|
68f5626189
|
Change utils.http to use requests
|
2018-10-10 13:41:58 +01:00 |
|
jesopo
|
c28a41ad21
|
Remove debug print in src.utils.http
|
2018-10-09 22:39:34 +01:00 |
|
jesopo
|
f69a1ce7c1
|
Return response code from utils.http.get_url when code=True and soup=True
|
2018-10-09 22:16:04 +01:00 |
|
jesopo
|
383767c7fb
|
Support post_data in utils.http.get_url
|
2018-10-08 12:43:31 +01:00 |
|
jesopo
|
69d58eede2
|
Move src/Utils.py in to src/utils/, splitting functionality out in to modules of
related functionality
|
2018-10-03 13:22:37 +01:00 |
|