Commit graph

46 commits

Author SHA1 Message Date
jesopo
408b89aeb7 use \S+ for url regex (for non-ascii chars), use url_sanitize to catch <> 2019-09-02 13:25:48 +01:00
jesopo
20042edfd9 Allow bypass of content-type check in utils.http.request 2019-08-05 15:41:02 +01:00
jesopo
d093027431 not all HTTP responses have content-type 2019-08-02 17:33:16 +01:00
jesopo
c19c6c0e14 asyncio.gather -> asyncio.wait (with timeout) 2019-07-08 14:50:11 +01:00
jesopo
469c725675 tell asyncio.gather which loop to use 2019-07-08 14:41:12 +01:00
jesopo
a1438abf66 close event loop when we're done with it (request_many()) 2019-07-08 13:59:48 +01:00
jesopo
81c7af8ab5 Don't try/except async http exceptions 2019-07-08 13:51:02 +01:00
jesopo
ee0ec0eca1 switch request_many() to use asyncio.gather 2019-07-08 13:46:27 +01:00
jesopo
b62ba469d7 catch async exceptions in utils.http.request_many() 2019-07-08 13:18:59 +01:00
jesopo
078681eddf add missing schema in utils.http.sanitise_url, use in rss.py 2019-07-08 12:54:06 +01:00
jesopo
ecb8364d0d switch to using asyncio's event loop 2019-07-08 12:45:10 +01:00
jesopo
15e143fcff implement utils.http.request_many as a tonado ioloop yield 2019-07-08 11:43:09 +01:00
jesopo
637067c62c url_validate() -> url_sanitise() 2019-07-02 14:15:49 +01:00
jesopo
534854127b Add utils.http.url_validate() for best-effort url tidying 2019-07-02 14:10:18 +01:00
jesopo
f9eb017466 message arg for HTTPWrongContentTypeException/HTTPParsingException 2019-06-28 23:01:21 +01:00
jesopo
97810db8df Give descriptions to utils.http.HTTPException subclasses 2019-06-27 18:28:08 +01:00
jesopo
16d331dd43 add allow_redirects kwarg to utils.http.request() 2019-06-26 17:53:16 +01:00
jesopo
a802e66dcf Defer decoding http payload bytestring until after checking ContentType 2019-06-04 13:47:03 +01:00
jesopo
0be9046669 Pass str object to BeautifulSoup, not bytes. closes #56 2019-05-28 10:22:35 +01:00
Patrick Nappa
2c344c9ddd forgot the beautiful % 2019-05-03 13:50:51 +10:00
Patrick Nappa
471c11e229 ensure that non-url characters not separated by whitespace aren't consumed 2019-05-03 13:43:08 +10:00
jesopo
bdcb4b5db2 Add missing ":" 2019-04-25 17:50:41 +01:00
jesopo
1240b154cb Support interfaces that don't have AF_INET and/or AF_INET6 2019-04-25 17:48:51 +01:00
jesopo
7643a962bd Refuse to get the title for any url that points locall 2019-04-25 15:58:58 +01:00
jesopo
dffee4d223 Move REGEX_URL out of isgd.py and title.py in to utils.http 2019-04-24 15:46:54 +01:00
jesopo
197ae2e053 Raise a specific exception in utils.http.request for "wrong content type" 2019-02-28 23:28:45 +00:00
jesopo
846b881e52 Throw ValueError when utils.http.request tries to soup non-html/xml data 2019-02-27 15:16:08 +00:00
jesopo
cfaf6864fc Don't try to parse non-html/xml stuff with BeautifulSoup 2019-02-26 11:18:50 +00:00
jesopo
2d3bb2b5e8 Typo in utils.http.request, 'response_heders' -> 'response_headers' 2018-12-11 22:31:14 +00:00
jesopo
5b59740043 Pass a dict to utils.CaseInsensitiveDict, not a MutableMapping 2018-12-11 22:30:57 +00:00
jesopo
d373edfaae Add missing utils import in utils.http 2018-12-11 22:30:05 +00:00
jesopo
793d234a0b 'utils.http.get_url' -> 'utils.http.request', return a Response object from
utils.http.request
2018-12-11 22:26:38 +00:00
jesopo
b543e31cd2 Fix/refactor issues brought up by type hint linting 2018-10-30 17:49:35 +00:00
jesopo
e07553c362 Add type/return hints throughout src/ and, in doing so, fix some cyclical
references.
2018-10-30 14:58:48 +00:00
jesopo
d3231e3282 signal.signal timer callback takes 2 args 2018-10-25 14:09:19 +01:00
jesopo
c655668bbe Add fallback_encoding to utils.http.get_url, in case a page has no implicit
encoding
2018-10-10 23:49:59 +01:00
jesopo
f286f3bf48 .decode data prior to json.loads in utils.http.get_url 2018-10-10 15:25:08 +01:00
jesopo
951c315cec Fix syntax error for throwing a timeout when signal.alarm fires 2018-10-10 15:07:04 +01:00
jesopo
015fa8ddff .decode plaintext returns from utils.http.get_url 2018-10-10 15:06:30 +01:00
jesopo
5b9ffe013d Use signal.alarm to Deadline utils.http.get_url and throw useful exceptions 2018-10-10 14:25:44 +01:00
jesopo
be75f72356 Set a max size of 100mb for utils.http.get_url 2018-10-10 14:05:15 +01:00
jesopo
68f5626189 Change utils.http to use requests 2018-10-10 13:41:58 +01:00
jesopo
c28a41ad21 Remove debug print in src.utils.http 2018-10-09 22:39:34 +01:00
jesopo
f69a1ce7c1 Return response code from utils.http.get_url when code=True and soup=True 2018-10-09 22:16:04 +01:00
jesopo
383767c7fb Support post_data in utils.http.get_url 2018-10-08 12:43:31 +01:00
jesopo
69d58eede2 Move src/Utils.py in to src/utils/, splitting functionality out in to modules of
related functionality
2018-10-03 13:22:37 +01:00