Simplify handling of encoding #69

Delgan · 2019-01-18T13:09:28Z

Taking advantage of the fact that we no longer need to support Python 2.7, I think we can largely simplify how we manage strings encoding.

This avoids the use of hacky ProxyBufferStreamWrapper class, the dubious to_unicode() and to_bytes() functions, and the conditional statement in write_stream().

Basically, we just format the exception as an unicode string, and we let the sys.stderr stream handles encoding, no need to deal with .buffer bytes.

You may notice one side-effect: utf-8 characters like "天" are no longer displayed as it on ascii terminals. I think this is actually the correct way to do it.

One thing I don't understand with the current implementation is that in one hand we test encoding of └ and fallback to -> on error, on the other hand we manage to print 天 in all cases. This results in traceback which may look like -> "天". It's paradoxical, either we can display utf8 characters or we can't. I suspect that this is a source of errors for the problems encountered by some users.

By writing not encoded unicode to sys.stderr, the unprintable characters are automatically escapted with the surrogateescape policy, and hence displays -> "\u5929" on ascii terminals, └ "天" otherwise.

Also, I replaced sys.getpreferredencoding() with STREAM.encoding, because we are writing to STREAM (sys.stderr) so why not use its specified encoding? Using sys.getpreferredencoding() proved to display mojibake characters to some users, so maybe this will fix it.

I made some tests on both Linux and Windows and compared exception formatting between standard and better_exceptions based on locale and IO encoding. The handling of utf8 characters is now identitical to what is done by the default exception handler, so I think this should reduce problems due to encoding.

This pull request is made for the python3_only branch.

Delgan · 2019-01-19T10:17:13Z

@Qix- I realized there is actually a problem with this solution.

Given this code runing on ascii terminal:

a = "天"
"天" * a

Without better_exceptions:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    "\u5929" * a
TypeError: can't multiply sequence by non-int of type 'str'

With better_exeptions/master:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    '天' * a
            -> '天'
TypeError: can't multiply sequence by non-int of type 'str'

With better_exceptions/simplify_encoding:

Traceback (most recent call last):
  File "a.py", line 6, in <module>
    '\u5929' * a
            -> '\u5929'
TypeError: can't multiply sequence by non-int of type 'str

The column where to start -> is wrongly computed as it is calculated from the non-encoded source string. I have a solution but can't really fix it here because of others problem with source formatting. I made a branch based on this one which fixes source formatting and where this can be easily fixed.

Simplify handling of encoding

091e236

Qix- mentioned this pull request Sep 28, 2019

Use stream encoding instead of locale preferred encoding #88

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify handling of encoding #69

Simplify handling of encoding #69

Delgan commented Jan 18, 2019 •

edited

Loading

Delgan commented Jan 19, 2019

Simplify handling of encoding #69

Are you sure you want to change the base?

Simplify handling of encoding #69

Conversation

Delgan commented Jan 18, 2019 • edited Loading

Delgan commented Jan 19, 2019

Delgan commented Jan 18, 2019 •

edited

Loading