Commit graph

18 commits

Author SHA1 Message Date
jesopo
b8518d745f Move all logic related to score reasons until after we've checked if we have
reasons (mixed_unicode.py)
2018-11-20 20:43:26 +00:00
jesopo
993403f213 .items -> .items() 2018-11-20 20:41:51 +00:00
jesopo
957b5413dc Use collections.Counter instea of itertools.groupby to group together all
instances, not just consecutive instaces (mixed_unicode.py)
2018-11-20 20:30:48 +00:00
jesopo
80dd3bb5e1 Don't count Unknown towards additional scripts count (mixed_unicode.py) 2018-11-20 14:17:46 +00:00
jesopo
0915dbd3fa 'AdditonalScript' -> 'AdditionalScript', 'score_reasons' -> 'reasons' 2018-11-20 13:50:07 +00:00
jesopo
537e2eebc4 Show reasons for score points (mixed_unicode.py) 2018-11-20 13:47:38 +00:00
jesopo
b98bf65a86 Add a point to a message's score for each additional script they use
(mixed_unicode.py)
2018-11-20 13:24:28 +00:00
jesopo
e31d9750ed (for the moment) remove percentage-ising scores (mixed_unicode.py) 2018-11-20 13:23:11 +00:00
jesopo
5ea34b261f TRACE log score with 2 decimal places (mixed_unicode.py) 2018-11-20 13:14:35 +00:00
jesopo
3dccc9f4e0 Keep a track of different scripts in a message, round score to 2 decimal places
(mixed_unicode.py)
2018-11-20 13:13:11 +00:00
jesopo
c59a5600a8 Score mixed unicode as a percentage (mixed_unicode.py) 2018-11-20 13:08:47 +00:00
jesopo
727ade4022 Only TRACE log when score is more than 0 (mixed_unicode.py) 2018-11-20 12:42:45 +00:00
Evelyn
563bc59208 Mixed unicode: Add Cherokee and Tai Le blocks 2018-11-20 12:29:03 +00:00
Evelyn
22939dd0a9 Mixed unicode: Ranges expressed in hex, with comments 2018-11-20 12:18:56 +00:00
Evelyn
e70ec91a7a Add Coptic range to mixed unicode module 2018-11-20 12:07:54 +00:00
jesopo
52d5b5da49 Detect full-width characters (mixed_unicode.py) 2018-11-20 12:00:47 +00:00
jesopo
e54db0858c Detect Armenian script (mixed_unicode.py) 2018-11-20 11:44:13 +00:00
jesopo
b19ce0be2f Add first version of modules/mixed_unicode.py, designed to detect when we get a
message that mixes scripts (latin, cyrillic, greek, etc) that might be spam
2018-11-20 11:38:30 +00:00