[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users


I have everything for ROS2 beta3 compiled and running on aarch64 with CoreDX. Most things work fine. However, whenever I try to create a subscription in a python node, I get this error:
Error in /usr/bin/python3: free(): invalid pointer: 0x...
I see the error when running demo_nodes_py listener. The same code executes fine on x64 platforms. Does our python extension rely upon platform-specific code for some of its pointer handling or struct copies? Of course aarch64 should also be using 64bit pointers. What would be different between FastRTPS (which works) and CoreDX in the python extensions? What can I do? Where should I look?

PS, FastRTPS still spews a lot of this when I run the listener: [RTPS_HISTORY Error] Change payload size of '8808' bytes is larger than the history payload size of '5000' bytes and cannot be resized. -> Function add_change





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/1) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users


- Do you happen to have a backtrace ?
- Does it fail when you create a subscription or when you try to deserialize a message?
- Do you have the same problem with message types that are of known size (like uint64t) ?


Can you reproduce it with a pure C communication ? (running [test_messages_c](https://github.com/ros2/system_tests/blob/master/test_communication/test/test_messages_c.cpp) from test_communication for example)
I would expect the problem to be the same but that would at least allow you to rule out anything Python related.

My guess is that it may be something in the coreDX C typesupport. We test only Fast-RTPS on aarch64 ATM that uses the introspection mechanism. It's possible that the alignment (and thus the pointers you access and free) are not correct when its built for aarch64.

[quote="BrannonKing, post:1, topic:3057"]
PS, FastRTPS still spews a lot of this when I run the listener: [RTPS_HISTORY Error] Change payload size of 8808 bytes is larger than the history payload size of 5000 bytes and cannot be resized. -&gt; Function add_change
[/quote]

Can you please provide more details on the related thread (https://discourse.ros.org/t/payload-size-error-with-fastrtps/2815)





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/2) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


>Do you happen to have a backtrace?

I can't seem to create one. No core dump is produced even with `ulimit -c unlimited`. The error must be caught (but subscription not created).

>Does it fail when you create a subscription or when you try to deserialize a message?

Definitely when creating the subscription.

>Do you have the same problem with message types that are of known size (like uint64t) ?

Yes, I do.

>Can you reproduce it with a pure C communication?

No, I cannot.

> running test_messages_c...

All the rclcpp tests from `test_communications` pass and all the rclpy tests fail with the same error:
`[test_subscriber] *** Error in `/usr/bin/python3': free(): invalid pointer: 0x0000007f9753f768 ***`

I question the `PyMem_Free` call here (and in similar contexts throughout the file): https://github.com/ros2/rclpy/blob/ec2220239d089f7b5d03a6ed1f6a3136e7c4d049/rclpy/src/rclpy/_rclpy.c#L1183
I'm no Python extension expert, but surely it's not a best-practice to destroy objects passed in by the user. That would be the user/caller's responsibility. What if the same QoS object was passed to multiple subscription calls but the first one destroyed it? In looking around online, it seems that the right approach is to call `Py_XDECREF` on each object created via `PyArg_ParseTuple` (every parameter with type "O", other types having their individual cleanup instructions).





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/3) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


Thanks for the detailed answer.

[quote="BrannonKing, post:3, topic:3057"]
Im no Python extension expert, but surely its not a best-practice to destroy objects passed in by the user. That would be the user/callers responsibility. What if the same QoS object was passed to multiple subscription calls but the first one destroyed it?
[/quote]

That is a very good point. We could either modify the behavior and make it clear that users should destroy such objects themselves or document the fact that this is freed by the function. Though I tried creating a bunch of subscriptions using the same qos_profile object (custom or default) and couldn't reproduce the error described in this thread.

To go back to the problem at hand, if the `PyMem_Free` is the issue, it doesn't explain why the first subscription created would crash because at that point the object has not been destroyed yet. And if this was only a logic issue in the memory management in the Python stack, I would expect the error to be consistent regardless of the rmw implementation used...
So I do still think that's something inside rmw_coredx not behaving as expected.

Could you confirm if the error disappears if you modify rclpy to not free the qos_profile on subscription creation ?

[quote="BrannonKing, post:3, topic:3057"]
I cant seem to create one. No core dump is produced even with ulimit -c unlimited. The error must be caught (but subscription not created).
[/quote]

gdb should allow you to trace the code through the c extensions.
`gdb --args python3 <PATH_TO_YOUR_PYTHON_LISTENER>`
The backtrace would be very useful to track down where the problem comes from.

[quote="BrannonKing, post:1, topic:3057"]
I have everything for ROS2 beta3 compiled and running on aarch64 with CoreDX. Most things work fine.
[/quote]
Just to confirm: you have exactly the beta3 code without modification except that you are using your own version of rmw_coredx?





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/4) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


>Could you confirm if the error disappears if you modify rclpy to not free the qos_profile on subscription creation?

I confirmed that the error persists even without destroying the qos_profile.

>you have exactly the beta3 code without modification except that you are using your own version of rmw_coredx?
 
Correct. I did manage to get a not-so-helpful stack trace from gdb but I don't think it is right yet since it's from publisher.py. I'll keep working on that.





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/5) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


I've been attempting to get a stack trace of the crash through manual means. It crashes here:

1. https://github.com/ros2/rosidl/blob/48eb1bc0ee2902feaad47eb55c19c10a2c322410/rosidl_generator_c/src/message_type_support.c#L28
2. from https://github.com/asirobots/rmw_coredx/blob/5aad0cca594edfeee7e8967327ac7f0fc6348a56/rmw_coredx_cpp/src/functions.cpp#L60
3. from rmw_create_subscription in that same file.

What is `func` set to there? Or where would I put my next `printf`?





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/6) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


The level of "remote" debugging is getting pretty difficult. If you can provide a set of copy-n-paste-able steps to reproduce the problem I can take a look at the actual failure in gdb.





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/7) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


Sure, although I'm not sure these steps are easier than helping me understand message_type_support.c. Steps to be ran on aarch64 (ARM Cortex 53 or 57):
1. Take the ros2.repos file from https://github.com/asirobots/ros2 in the release-beta3-asi branch.
2. Remove rmw_opensplice from the repos file so as to not confuse rclpy (which seems to load opensplice message libraries at random when running with CoreDX as the middleware -- feel free to debug that too).
3. Use vcs tool to process the repos file.
4. Modify src/ros2/tinyxml_vendor/tinyxml_cmakelists.txt to include set(CMAKE_POSITION_INDEPENDENT_CODE ON), which might only be necessary for GCC7 (which I have been using for cortex57 support; I'm not sure if the error happens with GCC5).
5. export CFLAGS="-march=native", export CXXFLAGS="-march=native"
6. export RMW_IMPLEMENTATION=rmw_coredx_cpp
7. compile: src/ament/ament_tools/scripts/ament.py build --skip-packages ros1_bridge --cmake-args -DCMAKE_BUILD_TYPE=Release (although you probably don't want a Release build for tracking this)
8. Get your CoreDX environment variables established (run their env script and then run its output). You may need a copy of CoreDX from here: http://www.twinoakscomputing.com/lic_eval/coredx-4.0.20-Linux_2.6_aarch64_gcc49-Evaluation.tgz . TwinOaks will give you an evaluation license if needed.
9. Source in the install and execute ros2 run demo_nodes_py listener -- it will give an error and not work correctly.





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/8) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


[quote="BrannonKing, post:6, topic:3057"]
What is func set to there?
[/quote]

The struct containing `func` is being declared [here](https://github.com/ros2/rosidl/blob/34fdfe6b3b8b8de0c41afd079ae8f361747b3c4a/rosidl_generator_c/include/rosidl_generator_c/message_type_support_struct.h#L35).

For the CoreDX C typesupport it is initialized [here](https://github.com/asirobots/rmw_coredx/blob/5aad0cca594edfeee7e8967327ac7f0fc6348a56/rosidl_typesupport_coredx_c/resource/msg__type_support_c.cpp.em#L566).

The function basically provide the typesupport for a specific message which enables "generically" written code to use / interact with messages which were not available at compile time but which have been defined later.

[quote="BrannonKing, post:6, topic:3057"]
Or where would I put my next printf?
[/quote]

I would suggest to build in `Debug` so that the `assert`'s are being checked. Additionally printing the value of `handle` as well as its members and `identifier` might provide valuable information (within [message_type_support.c](https://github.com/ros2/rosidl/blob/48eb1bc0ee2902feaad47eb55c19c10a2c322410/rosidl_generator_c/src/message_type_support.c#L21-L29)).

[quote="dirk-thomas, post:7, topic:3057"]
a set of copy-n-paste-able steps
[/quote]

I was expecting a list of command line invocations which I can easily run by copy-and-pasting them to a console. I managed to follow your text instructions (even though bullet 7 and 8 are in the wrong order). In the future it would be great to provide ready-to-run commands (e.g. instead of letting people edit files it is more convenient to share the already edited version through Gist) since it reduces the effort to help you and makes the whole process less ambiguous.

Anyway my build finished and when I run the talker and listener I see the following error message twice:

```
dds_thread_set_stacksize(): pthread_attr_setstacksize() error(22):Invalid argument
```

Beside that the talker is printing `Publishing: "Hello World: N"` and the listener `I heard: [Hello World: N]` so that seems to work just fine for me. Note: I built `Debug` since I expected the need to debug asserts / segfaults / etc.





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/9) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


Wow, Dirk. Kudos! That's good support. I thought no one would work through those steps. I worked on this issue a little more today. First, I discovered that I don't get the error when I compile rosidl_interfaces in debug mode. When in release mode, it is crashing in this line `std::istringstream ss(value)` from the `split` method in type_support_dispatch.cpp (the c one, not the cpp one). It must be a bug in whatever local version of the standard library I'm running. Is it glibc that I would check the version on? I'm out of time for the day, but I will try a different split implementation tomorrow.





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/10) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>
Reply | Threaded
Open this post in threaded view
|

[Discourse.ros.org] [Next Generation ROS] Error using python3 nodes on ARM64 & CoreDX

Saurabh Bansal via ros-users
In reply to this post by Saurabh Bansal via ros-users


I replaced the split implementation in
src/ros2/rosidl_typesupport/rosidl_typesupport_c/src/type_support_dispatch.cpp
It then crashed for me in very similar code found in
src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp

After that everything was working for me. It appears to be a bug in the aarch64's libstdc++.6.0.24 -- at least that's the version that I have on my ARM64 machines; it came with GCC 7.2. I was unable to make a simple program that reproduced the problem. However, I see that there are some pending stringstream fixes for the next GCC version, e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81338 .

For my fix I incorporated this split implementation: https://stackoverflow.com/a/1493195/1208289





---
[Visit Topic](https://discourse.ros.org/t/error-using-python3-nodes-on-arm64-coredx/3057/11) or reply to this email to respond.


If you do not want to receive messages from ros-users please use the unsubscribe link below. If you use the one above, you will stop all of ros-users from receiving updates.
______________________________________________________________________________
ros-users mailing list
[hidden email]
http://lists.ros.org/mailman/listinfo/ros-users
Unsubscribe: <http://lists.ros.org/mailman//options/ros-users>