Python3 and strings

asmodehn · August 7, 2017, 6:04am

Hi everyone,

I am currently writing some python2/python3 libraries to work with ROS messages, and I am in need of some information.

How to treat the ‘string’ message field in python3 ??
There is no info about that in http://wiki.ros.org/msg , but in python3 we need to specify encoder/decoder whenever we change a string into a list of bytes and vice versa…

Any information about this I missed somewhere ? Thanks !

kartikmohta · August 7, 2017, 6:53pm

That page does mention:

unicode strings are currently not supported as a ROS data type. utf-8 should be used to be compatible with ROS string serialization. In python 2, this encoding is automatic for unicode objects, but decoding must be done manually. In python 3, both encoding and decoding are automatic.

asmodehn · August 8, 2017, 2:06am

It also says :

Primitive Type: string
Serialization: ascii string (4)
C++: std::string
Python: str

and

Primitive Type: uint8[]
Serialization: uint32 length prefix
C++: std::vector
Python: bytes

Also in python3 both encoding and decoding are automatic, based on the platform you are running on, provided you use the right type (bytes or str).

If two platforms use different encodings for two nodes communicating, then messages will probably arrived garbled, if we intend to send a string.

On the other hand, if we do not send a string with an encoding, then we are sending bytes, just like for a uint8[] field.

Is string the same as uint8[] ? (and Python type should be bytes)
OR should ROS enforce some unicode encoding for string ? (and Python type can be str)

In any instance it seems the wiki page should separately list Python2 and Python3 to avoid confusion…

kartikmohta · August 8, 2017, 3:07am

From the generated python code for a msg, when serializing the message into a buffer to send it, ROS encodes the string field as a utf-8 string (x is a string field in the ROS msg):

_x = self.x
if python3 or type(_x) == unicode:
  _x = _x.encode('utf-8')

And similarly, when deserializing the received buffer, it is converted into a Python str with utf-8 encoding:

if python3:
  self.x = str[start:end].decode('utf-8')
else:
  self.x = str[start:end]

So on the user side, you just need to make sure that the encoding for the string you’re sending is utf-8.

With that in mind, that block from the msg wiki page seems sufficient to me:

unicode strings are currently not supported as a ROS data type. utf-8 should be used to be compatible with ROS string serialization. In python 2, this encoding is automatic for unicode objects, but decoding must be done manually. In python 3, both encoding and decoding are automatic.

asmodehn · August 8, 2017, 6:07am

Interestingly, from this code, I understand the exact opposite of

unicode strings are currently not supported

We are obviously using unicode codec UTF-8 to encode and decode it, and the matching python type is a unicode string. So looking at this code, I would say :
’ A string field in a ROS message is a unicode string, and will be encoded/decoded using UTF-8 for serialization/deserialization’

And in that case the wiki should state :

Primitive Type: string
Serialization: utf-8 string (4)
C++: std::string
Python3: str
Python2: unicode

On the other hand, if this is not true and the ROS serialization is only supporting ASCII, then the python matching type should be bytes, and the wiki should say :

Primitive Type: string
Serialization: ascii string (4)
C++: std::string
Python3: bytes
Python2: str

and the serialization code needs to be fixed ( no need to encode/decode, unicode is not supported ).

kartikmohta · August 8, 2017, 7:08am

Yes you’re right, that statement doesn’t seem to be correct.

As per your recommendation, I think mentioning utf-8 string as the serialization type would be fine (though not sure if that is the right thing with the C++ client library), but it would be better to use/recommend the str type for Python2 since there is no automatic decoding into a unicode string for Python 2. So it would just be type str for both Python 2 and 3.

asmodehn · August 11, 2017, 7:07am

But str in Python3 is unicode in Python2, and having different ways to serialize data between different versions of python will break a few things in many places (“why my message is garbled on this node and not that one?”).
We could do that, but it would require a “big warning” everywhere we mention this topic…

=> I could not find any REP specification regarding the message serialization, and how to match the types of the supported languages and integrate deserialization with it. I seems it’s something we need to drive implementation (especially given ROS supports multiple languages) and prevent “incomplete features” as much as possible.

The current serialization code breaks :

when we pass a bytes in python 3 (no encode method) fix attempt here
when we pass a unicode in python2 (receiving end lose the encoding)
when we pass a str in python3 (receiving python2 end lose the encoding)

=> we need a solution (design fix) that integrates properly for all supported languages…

kartikmohta · August 11, 2017, 9:32pm

I think fully supporting unicode strings would require a lot of effort, more so on the C++ client libraries.
Sadly not a solution, but for now the recommendation of just sticking to ascii strings would prevent the issues mentioned in the the second and third bullets affecting user code.

asmodehn · August 14, 2017, 5:08am

Agreed. That means that the advised/documented python3 type should be bytes…
I went ahead and worked on an update on the wiki, to try to remove the confusion when talking about py2/py3.

kartikmohta · August 14, 2017, 5:33am

Well, you can make both Python 2 & 3 string msg types be bytes with a note that bytes is the same as str in Python2.

$ python2
Python 2.7.13 (default, Jul 21 2017, 03:24:34) 
>>> a = str('123')
>>> type(a)
<type 'str'>
>>> a = bytes('123')
>>> type(a)
<type 'str'>

Topic		Replies	Views
Discussion about how to add a new PRIMITIVE_TYPES to rosidl(msg) Next Generation ROS ros2	10	1898	September 5, 2018
Rosbags - the pure python library for everything rosbag release , ros2 , rosbag , rosbag2	14	18378	January 29, 2024
ROS 2 type support introspection and conversion of C/C++ messages to/from YAML Next Generation ROS	11	5767	October 11, 2021
Rosidl message builder utilize default field values``` Next Generation ROS	4	2053	March 24, 2024
Discussion about uint8[] type transport in ROS2 python Next Generation ROS ros2	2	1778	May 8, 2018

Python3 and strings

Related topics