Skip to content

Msgpack can't differentiate between raw binary data and text strings #121

@rasky

Description

@rasky

It looks like the msgpack spec does not differentiate between a raw binary data buffer and text strings. This causes some problems in all high-level language wrappers, because most high-level languages have different data types for text strings and binary buffers.

For instance, the objective C wrapper is currently broken because it tries to decode all raw bytes into high-level strings (through UTF-8 decoding) because using a text string (NSString) is the only way to populate a NSDictionary (map). But it breaks because obviously some binary buffers cannot be decoded as UTF8-strings.

The same happen with Python2/3: when you serialize and deserialize a unicode string, you always get a binary string back, and this breaks simple code:

>>> a = { u"東京": True }
>>> mp = msgpack.dumps(a)
>>> b = msgpack.loads(mp)
>>> a == b
False
>>> b[u"東京"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: u'\u6771\u4eac'
>>> b
{'\xe6\x9d\xb1\xe4\xba\xac': True}

As you can see, when you deserialize, you get a different object which does not work (because internal text strings are not decoded from UTF-8).

Most wrappers have an option to specify automatic UTF-8 decoding for all raw bytes, but that is wrong because it will apply to ALL raw bytes, while you might have a mixture of text strings and binary bytes within the same messagepack. It's not at all uncommon.

As I said, this problem can be found in almost all high-level messagepack bindings, because most high-level languages have different data types for text strings and binary buffers.

I think the only final solution for this problem is to enhance the msgpack spec to explicitly differentiate between text strings and binary buffers. Is this something that msgpack authors are willing to discuss?

I am willing to implement whatever solution you decide it's the best one and submit a pull request.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions