jesopo
b889a9f841
add utils.http.Session object, to preserve cookies across requests
2019-12-03 13:00:43 +00:00
jesopo
c79bd6d0ba
utils.http.Response.decode() should default to detected encoding
2019-11-28 07:35:16 +00:00
jesopo
e4a5bd01e9
explicitly use "lxml" for finding page encoding
2019-11-26 14:34:48 +00:00
jesopo
8e9da0d681
_find_encoding takes bytes
and soupifies now
2019-11-26 13:58:37 +00:00
jesopo
c898bc4be1
utils.http.request_many() shouldn't decode data for Response
2019-11-26 13:54:17 +00:00
jesopo
2d21dfa229
utils.http.Response.data should always be bytes
- add .decode and .soup
2019-11-26 13:42:01 +00:00
jesopo
ed775ddbe3
remove parser
from utils.http.Request, add Request.soup()
2019-11-26 11:35:56 +00:00
jesopo
6a6e789ec9
add cookies
and .json()
to utils.http.Response objects
2019-11-25 18:17:30 +00:00
jesopo
ab8bc65cc9
change utils.http.Request to be a dataclass
2019-11-25 13:42:10 +00:00
jesopo
4d30263315
give bitbot a unique User-Agent
...
closes #206
2019-11-20 14:42:34 +00:00
Valentin Lorentz
fbf8cd1a16
Fix type errors detected by 'mypy --ignore-missing-imports src'.
2019-10-30 22:26:59 +01:00
jesopo
f64131a10f
support utf8 hostnames by punycode (idna) encoding
2019-10-18 10:58:24 +01:00
jesopo
9ab817ca58
parse out content_type in Response ctor
2019-10-05 22:56:56 +01:00
jesopo
b2473a4ac4
parse content-type out in utils.http.request, put it on Response object
2019-10-04 13:07:09 +01:00
jesopo
f306213cb8
'is_localhost()' -> 'host_permitted()'
2019-09-30 15:15:20 +01:00
jesopo
b9c64b7cf1
use ipaddress is_loopback etc to do better forbidden ranges
...
closes #87
2019-09-30 15:12:01 +01:00
jesopo
2f49fb99e9
assume http fallback_encoding by content-type (utf8 for json)
2019-09-25 15:32:09 +01:00
jesopo
72649a90c2
only BeautifulSoup for finding encoding when it's a html-ish type
2019-09-20 13:38:00 +01:00
jesopo
e34259f967
log call was replaced with Exception but [] on args remained
2019-09-19 15:30:27 +01:00
jesopo
88a69aaa66
give Requests, use them in utils.http.request_many()
2019-09-19 14:54:44 +01:00
jesopo
d8e3a1c7ee
utils.http.request_() has no self, let alone self.log
2019-09-19 14:02:48 +01:00
jesopo
b69c9146b2
should be using pair_start/pair_end throughout for
2019-09-19 13:51:27 +01:00
jesopo
cd0d39ee5e
also show "bad" data in HTTPParsingException when a message is provided
2019-09-18 14:20:59 +01:00
jesopo
312f8906ae
show "bad" data in HTTPParsingException message
2019-09-18 10:52:05 +01:00
jesopo
ee6360be22
don't check already-read data when checking for too-large requests
...
this check was here because the first read will return empty if it was an
invalid byte sequence for e.g. gzip because we needed to receive more data. the
second read will always return data (not decoded) so regardless of what the
already-read data is, the second read is the only criteria we need.
2019-09-17 17:33:23 +01:00
jesopo
1ac7f2697e
log which URL caused an error in request_many
2019-09-17 17:09:19 +01:00
jesopo
98545a9fb4
only decode content-types in DECODE_CONTENT_TYPES
2019-09-17 16:12:03 +01:00
jesopo
8ca0d30fef
Response.__init__() needs encoding
now
2019-09-17 14:11:12 +01:00
jesopo
b7dd78ef1a
restore 5 second (instead of default 10) deadline for http.request
2019-09-17 13:44:14 +01:00
jesopo
94c3ff962b
use utils.deadline_process() in utils.http._request() so background threads can
...
call _request()
2019-09-17 13:41:11 +01:00
jesopo
47735421b8
add json_body
arg to Request to json-encode body, only return from body
if
...
not null
2019-09-16 10:57:18 +01:00
jesopo
77f50187c5
allow Requests to specify a useragent
2019-09-12 10:41:50 +01:00
jesopo
9d6a3982ed
add a helper utils.http.Client static object
2019-09-11 17:53:49 +01:00
jesopo
51dc26d113
add proxy
to Request objects
2019-09-11 17:53:37 +01:00
jesopo
4a97c9eb0d
refactor utils.http.requests to support a Request object
2019-09-11 17:44:07 +01:00
jesopo
8f8cf92ae2
automatically decode certain http content types
2019-09-11 15:28:13 +01:00
jesopo
a9b106c6be
Don't try to .decode non-html things, default iso-lat-1 for non-html too
2019-09-09 16:17:26 +01:00
jesopo
b83f5d9e30
add flag to disable encoding detection
2019-09-09 14:59:08 +01:00
jesopo
5ef2b7af27
'str.split' -> 's.split'
2019-09-09 14:53:11 +01:00
jesopo
1df82c1cb2
still default to iso-latin-1 if no on-page or in-header content-type is present
2019-09-09 14:48:26 +01:00
jesopo
0a67659637
only look for <meta>-related tags when there are meta tags
2019-09-09 14:39:19 +01:00
jesopo
0a1077c5cd
add explicit None return for _find_encoding (mypy)
2019-09-09 14:25:01 +01:00
jesopo
ff9c82bf67
change utils.http.request to best-effort detect on-page encoding
...
closes #113
2019-09-09 14:11:18 +01:00
jesopo
397cfa8e7e
correctly qualify DeadlineExceededException namespace
2019-09-03 14:54:59 +01:00
jesopo
b7b2f31c1c
use utils.deadline() in utils.http.request, not raw sigalrm
2019-09-02 15:50:21 +01:00
jesopo
9cc1ee98eb
Pass the content of a webpage to HTTPParsingException
2019-09-02 13:27:44 +01:00
jesopo
408b89aeb7
use \S+ for url regex (for non-ascii chars), use url_sanitize to catch <>
2019-09-02 13:25:48 +01:00
jesopo
20042edfd9
Allow bypass of content-type check in utils.http.request
2019-08-05 15:41:02 +01:00
jesopo
d093027431
not all HTTP responses have content-type
2019-08-02 17:33:16 +01:00
jesopo
c19c6c0e14
asyncio.gather -> asyncio.wait (with timeout)
2019-07-08 14:50:11 +01:00