Commit graph

66 commits

Author SHA1 Message Date
jesopo
98545a9fb4 only decode content-types in DECODE_CONTENT_TYPES 2019-09-17 16:12:03 +01:00
jesopo
8ca0d30fef Response.__init__() needs encoding now 2019-09-17 14:11:12 +01:00
jesopo
b7dd78ef1a restore 5 second (instead of default 10) deadline for http.request 2019-09-17 13:44:14 +01:00
jesopo
94c3ff962b use utils.deadline_process() in utils.http._request() so background threads can
call _request()
2019-09-17 13:41:11 +01:00
jesopo
47735421b8 add json_body arg to Request to json-encode body, only return from body if
not null
2019-09-16 10:57:18 +01:00
jesopo
77f50187c5 allow Requests to specify a useragent 2019-09-12 10:41:50 +01:00
jesopo
9d6a3982ed add a helper utils.http.Client static object 2019-09-11 17:53:49 +01:00
jesopo
51dc26d113 add proxy to Request objects 2019-09-11 17:53:37 +01:00
jesopo
4a97c9eb0d refactor utils.http.requests to support a Request object 2019-09-11 17:44:07 +01:00
jesopo
8f8cf92ae2 automatically decode certain http content types 2019-09-11 15:28:13 +01:00
jesopo
a9b106c6be Don't try to .decode non-html things, default iso-lat-1 for non-html too 2019-09-09 16:17:26 +01:00
jesopo
b83f5d9e30 add flag to disable encoding detection 2019-09-09 14:59:08 +01:00
jesopo
5ef2b7af27 'str.split' -> 's.split' 2019-09-09 14:53:11 +01:00
jesopo
1df82c1cb2 still default to iso-latin-1 if no on-page or in-header content-type is present 2019-09-09 14:48:26 +01:00
jesopo
0a67659637 only look for <meta>-related tags when there are meta tags 2019-09-09 14:39:19 +01:00
jesopo
0a1077c5cd add explicit None return for _find_encoding (mypy) 2019-09-09 14:25:01 +01:00
jesopo
ff9c82bf67 change utils.http.request to best-effort detect on-page encoding
closes #113
2019-09-09 14:11:18 +01:00
jesopo
397cfa8e7e correctly qualify DeadlineExceededException namespace 2019-09-03 14:54:59 +01:00
jesopo
b7b2f31c1c use utils.deadline() in utils.http.request, not raw sigalrm 2019-09-02 15:50:21 +01:00
jesopo
9cc1ee98eb Pass the content of a webpage to HTTPParsingException 2019-09-02 13:27:44 +01:00
jesopo
408b89aeb7 use \S+ for url regex (for non-ascii chars), use url_sanitize to catch <> 2019-09-02 13:25:48 +01:00
jesopo
20042edfd9 Allow bypass of content-type check in utils.http.request 2019-08-05 15:41:02 +01:00
jesopo
d093027431 not all HTTP responses have content-type 2019-08-02 17:33:16 +01:00
jesopo
c19c6c0e14 asyncio.gather -> asyncio.wait (with timeout) 2019-07-08 14:50:11 +01:00
jesopo
469c725675 tell asyncio.gather which loop to use 2019-07-08 14:41:12 +01:00
jesopo
a1438abf66 close event loop when we're done with it (request_many()) 2019-07-08 13:59:48 +01:00
jesopo
81c7af8ab5 Don't try/except async http exceptions 2019-07-08 13:51:02 +01:00
jesopo
ee0ec0eca1 switch request_many() to use asyncio.gather 2019-07-08 13:46:27 +01:00
jesopo
b62ba469d7 catch async exceptions in utils.http.request_many() 2019-07-08 13:18:59 +01:00
jesopo
078681eddf add missing schema in utils.http.sanitise_url, use in rss.py 2019-07-08 12:54:06 +01:00
jesopo
ecb8364d0d switch to using asyncio's event loop 2019-07-08 12:45:10 +01:00
jesopo
15e143fcff implement utils.http.request_many as a tonado ioloop yield 2019-07-08 11:43:09 +01:00
jesopo
637067c62c url_validate() -> url_sanitise() 2019-07-02 14:15:49 +01:00
jesopo
534854127b Add utils.http.url_validate() for best-effort url tidying 2019-07-02 14:10:18 +01:00
jesopo
f9eb017466 message arg for HTTPWrongContentTypeException/HTTPParsingException 2019-06-28 23:01:21 +01:00
jesopo
97810db8df Give descriptions to utils.http.HTTPException subclasses 2019-06-27 18:28:08 +01:00
jesopo
16d331dd43 add allow_redirects kwarg to utils.http.request() 2019-06-26 17:53:16 +01:00
jesopo
a802e66dcf Defer decoding http payload bytestring until after checking ContentType 2019-06-04 13:47:03 +01:00
jesopo
0be9046669 Pass str object to BeautifulSoup, not bytes. closes #56 2019-05-28 10:22:35 +01:00
Patrick Nappa
2c344c9ddd forgot the beautiful % 2019-05-03 13:50:51 +10:00
Patrick Nappa
471c11e229 ensure that non-url characters not separated by whitespace aren't consumed 2019-05-03 13:43:08 +10:00
jesopo
bdcb4b5db2 Add missing ":" 2019-04-25 17:50:41 +01:00
jesopo
1240b154cb Support interfaces that don't have AF_INET and/or AF_INET6 2019-04-25 17:48:51 +01:00
jesopo
7643a962bd Refuse to get the title for any url that points locall 2019-04-25 15:58:58 +01:00
jesopo
dffee4d223 Move REGEX_URL out of isgd.py and title.py in to utils.http 2019-04-24 15:46:54 +01:00
jesopo
197ae2e053 Raise a specific exception in utils.http.request for "wrong content type" 2019-02-28 23:28:45 +00:00
jesopo
846b881e52 Throw ValueError when utils.http.request tries to soup non-html/xml data 2019-02-27 15:16:08 +00:00
jesopo
cfaf6864fc Don't try to parse non-html/xml stuff with BeautifulSoup 2019-02-26 11:18:50 +00:00
jesopo
2d3bb2b5e8 Typo in utils.http.request, 'response_heders' -> 'response_headers' 2018-12-11 22:31:14 +00:00
jesopo
5b59740043 Pass a dict to utils.CaseInsensitiveDict, not a MutableMapping 2018-12-11 22:30:57 +00:00