Save strings internally as WTF-8 #184
Open
+60
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RFC 8259 doesn't force strings to be valid unicode stings. In real it allows to contain any \uxxxx values. It's possible to keep any binary data in JSON strings. This commit removes limitation for strings to be valid UTF-8 strings.
WTF-8 (Wobbly Transformation Format − 8-bit) is asuperset of UTF-8 that encodes surrogate code points if they are not in a pair. It represents, in a way compatible with UTF-8, text from systems such as JavaScript and Windows that use UTF-16 internally but don’t enforce the well-formedness invariant that surrogates must be paired.
WTF-8 strings are not compatible with current tests. Tests use some python code which works only with valid UTF-8 strings. Need to upgrade tests system or replace it with something another that has full JSON support.