Hard-code UTF-8 encoding for the input file#56
Merged
willmcgugan merged 1 commit intomainfrom Jun 21, 2022
Merged
Conversation
1c03cbc to
ae1b084
Compare
olivierphi
commented
Jun 14, 2022
src/rich_cli/__main__.py
Outdated
| return (sys.stdin.read(), None) | ||
|
|
||
| with open(path, "rt") as resource_file: | ||
| with open(path, "rt", encoding="utf8") as resource_file: |
Author
There was a problem hiding this comment.
It fixes the described issue, but... Is doing this really a good idea? I'm not sure, to be honest 😅
Member
There was a problem hiding this comment.
The annoying truth is that you can't be sure what the encoding will be. Can you also add errors="replace" in the off change there are utf-8 encoding errors.
At some point we might use chardet to give us extra confidence.
Author
There was a problem hiding this comment.
TIL: 🎓 the Python open function can indeed accept a errors arg that specifies how encoding and decoding errors are to be handled - which is really handy! 🙂
Done in 53cb0e9
ae1b084 to
53cb0e9
Compare
willmcgugan
approved these changes
Jun 21, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a first draft to fix the encoding errors described in #55 - i.e. we're on Windows and the input file is encoded in UTF-8.
The role of PYTHONIOENCODING, as described in the issue
The bug report mentions that even when the PYTHONIOENCODING env var is set, the preferred encoding determined by Python on Windows is still a Windows-specific encoding.

However, it seems that this is actually pretty much the expected behaviour, as the behaviour of this env var seems to only affect stdin/stdout/stderr?
Potential breaking changes brought by this PR
Of course, assuming that the input file is always encoded in UTF-8, like we do with this PR, could break some existing usages of Rich-CLI. Especially on Windows, where UTF-8 is still not the default encoding if I'm not wrong?
Not sure what would be the safest way to handle that issue? 🤔
Potential ways to handler that better
Maybe we could try to use the file using the system's default encoding first, and then, only if that failed, fall back to UTF-8?
e.g. something like this: (pseudo-code)
As pointed out by @darrenburns , there is now the possibility to use a PYTHONUTF8 env var, which seems to work:

Add a flag and/or an env var specific to Rich-CLI to let the user tell Rich-CLI which encoding we should use to open the input file?
Maybe that would be the most flexible option, combined to a "before raising an exception, fall back to UTF-8 if the default encoding didn't work" strategy? What do you think @willmcgugan @darrenburns ? 🙂
Before / After
Before this fix:
After this fix:
fixes #55