jesopo
|
8ca0d30fef
|
Response.__init__() needs encoding now
|
2019-09-17 14:11:12 +01:00 |
|
jesopo
|
b7dd78ef1a
|
restore 5 second (instead of default 10) deadline for http.request
|
2019-09-17 13:44:14 +01:00 |
|
jesopo
|
94c3ff962b
|
use utils.deadline_process() in utils.http._request() so background threads can
call _request()
|
2019-09-17 13:41:11 +01:00 |
|
jesopo
|
47735421b8
|
add json_body arg to Request to json-encode body, only return from body if
not null
|
2019-09-16 10:57:18 +01:00 |
|
jesopo
|
77f50187c5
|
allow Requests to specify a useragent
|
2019-09-12 10:41:50 +01:00 |
|
jesopo
|
9d6a3982ed
|
add a helper utils.http.Client static object
|
2019-09-11 17:53:49 +01:00 |
|
jesopo
|
51dc26d113
|
add proxy to Request objects
|
2019-09-11 17:53:37 +01:00 |
|
jesopo
|
4a97c9eb0d
|
refactor utils.http.requests to support a Request object
|
2019-09-11 17:44:07 +01:00 |
|
jesopo
|
8f8cf92ae2
|
automatically decode certain http content types
|
2019-09-11 15:28:13 +01:00 |
|
jesopo
|
a9b106c6be
|
Don't try to .decode non-html things, default iso-lat-1 for non-html too
|
2019-09-09 16:17:26 +01:00 |
|
jesopo
|
b83f5d9e30
|
add flag to disable encoding detection
|
2019-09-09 14:59:08 +01:00 |
|
jesopo
|
5ef2b7af27
|
'str.split' -> 's.split'
|
2019-09-09 14:53:11 +01:00 |
|
jesopo
|
1df82c1cb2
|
still default to iso-latin-1 if no on-page or in-header content-type is present
|
2019-09-09 14:48:26 +01:00 |
|
jesopo
|
0a67659637
|
only look for <meta>-related tags when there are meta tags
|
2019-09-09 14:39:19 +01:00 |
|
jesopo
|
0a1077c5cd
|
add explicit None return for _find_encoding (mypy)
|
2019-09-09 14:25:01 +01:00 |
|
jesopo
|
ff9c82bf67
|
change utils.http.request to best-effort detect on-page encoding
closes #113
|
2019-09-09 14:11:18 +01:00 |
|
jesopo
|
397cfa8e7e
|
correctly qualify DeadlineExceededException namespace
|
2019-09-03 14:54:59 +01:00 |
|
jesopo
|
b7b2f31c1c
|
use utils.deadline() in utils.http.request, not raw sigalrm
|
2019-09-02 15:50:21 +01:00 |
|
jesopo
|
9cc1ee98eb
|
Pass the content of a webpage to HTTPParsingException
|
2019-09-02 13:27:44 +01:00 |
|
jesopo
|
408b89aeb7
|
use \S+ for url regex (for non-ascii chars), use url_sanitize to catch <>
|
2019-09-02 13:25:48 +01:00 |
|
jesopo
|
20042edfd9
|
Allow bypass of content-type check in utils.http.request
|
2019-08-05 15:41:02 +01:00 |
|
jesopo
|
d093027431
|
not all HTTP responses have content-type
|
2019-08-02 17:33:16 +01:00 |
|
jesopo
|
c19c6c0e14
|
asyncio.gather -> asyncio.wait (with timeout)
|
2019-07-08 14:50:11 +01:00 |
|
jesopo
|
469c725675
|
tell asyncio.gather which loop to use
|
2019-07-08 14:41:12 +01:00 |
|
jesopo
|
a1438abf66
|
close event loop when we're done with it (request_many())
|
2019-07-08 13:59:48 +01:00 |
|
jesopo
|
81c7af8ab5
|
Don't try/except async http exceptions
|
2019-07-08 13:51:02 +01:00 |
|
jesopo
|
ee0ec0eca1
|
switch request_many() to use asyncio.gather
|
2019-07-08 13:46:27 +01:00 |
|
jesopo
|
b62ba469d7
|
catch async exceptions in utils.http.request_many()
|
2019-07-08 13:18:59 +01:00 |
|
jesopo
|
078681eddf
|
add missing schema in utils.http.sanitise_url, use in rss.py
|
2019-07-08 12:54:06 +01:00 |
|
jesopo
|
ecb8364d0d
|
switch to using asyncio's event loop
|
2019-07-08 12:45:10 +01:00 |
|
jesopo
|
15e143fcff
|
implement utils.http.request_many as a tonado ioloop yield
|
2019-07-08 11:43:09 +01:00 |
|
jesopo
|
637067c62c
|
url_validate() -> url_sanitise()
|
2019-07-02 14:15:49 +01:00 |
|
jesopo
|
534854127b
|
Add utils.http.url_validate() for best-effort url tidying
|
2019-07-02 14:10:18 +01:00 |
|
jesopo
|
f9eb017466
|
message arg for HTTPWrongContentTypeException/HTTPParsingException
|
2019-06-28 23:01:21 +01:00 |
|
jesopo
|
97810db8df
|
Give descriptions to utils.http.HTTPException subclasses
|
2019-06-27 18:28:08 +01:00 |
|
jesopo
|
16d331dd43
|
add allow_redirects kwarg to utils.http.request()
|
2019-06-26 17:53:16 +01:00 |
|
jesopo
|
a802e66dcf
|
Defer decoding http payload bytestring until after checking ContentType
|
2019-06-04 13:47:03 +01:00 |
|
jesopo
|
0be9046669
|
Pass str object to BeautifulSoup, not bytes. closes #56
|
2019-05-28 10:22:35 +01:00 |
|
Patrick Nappa
|
2c344c9ddd
|
forgot the beautiful %
|
2019-05-03 13:50:51 +10:00 |
|
Patrick Nappa
|
471c11e229
|
ensure that non-url characters not separated by whitespace aren't consumed
|
2019-05-03 13:43:08 +10:00 |
|
jesopo
|
bdcb4b5db2
|
Add missing ":"
|
2019-04-25 17:50:41 +01:00 |
|
jesopo
|
1240b154cb
|
Support interfaces that don't have AF_INET and/or AF_INET6
|
2019-04-25 17:48:51 +01:00 |
|
jesopo
|
7643a962bd
|
Refuse to get the title for any url that points locall
|
2019-04-25 15:58:58 +01:00 |
|
jesopo
|
dffee4d223
|
Move REGEX_URL out of isgd.py and title.py in to utils.http
|
2019-04-24 15:46:54 +01:00 |
|
jesopo
|
197ae2e053
|
Raise a specific exception in utils.http.request for "wrong content type"
|
2019-02-28 23:28:45 +00:00 |
|
jesopo
|
846b881e52
|
Throw ValueError when utils.http.request tries to soup non-html/xml data
|
2019-02-27 15:16:08 +00:00 |
|
jesopo
|
cfaf6864fc
|
Don't try to parse non-html/xml stuff with BeautifulSoup
|
2019-02-26 11:18:50 +00:00 |
|
jesopo
|
2d3bb2b5e8
|
Typo in utils.http.request, 'response_heders' -> 'response_headers'
|
2018-12-11 22:31:14 +00:00 |
|
jesopo
|
5b59740043
|
Pass a dict to utils.CaseInsensitiveDict, not a MutableMapping
|
2018-12-11 22:30:57 +00:00 |
|
jesopo
|
d373edfaae
|
Add missing utils import in utils.http
|
2018-12-11 22:30:05 +00:00 |
|