As users of ROS 2, and probably anyone else who’s considered trying it out, are aware, ROS 2 doesn’t yet have an implementation for actions.
There have been some sporadic efforts to rectify this situation, and we are now at the point where several of us are actively moving forward to design and implement ROS 2 actions, and Open Robotics has scheduled a few developers to work on actions for the Crystal release.
There is a pull request here for an actions design document, but it needs a lot of work. There is also an experimental implementation done by @mkhansen of Intel to hash out some ideas and support their navstack2 work.
Some of our goals are:
Make actions a first-class citizen in ROS 2, the same as topics and servers, rather than as a separate library.
Implement actions in rcl so that all other client libraries (C++, Python, Ada, …) can get actions with exactly the same behaviour without needing to reimplement them.
Allow introspection and interaction with actions from the command line via a ros2 action command. As a side effect, make sure actions don’t pollute the output of ros2 topic or ros2 service anymore.
Improve the actions API over ROS 1, where such improvements are needed.
Improve the actions state machine if necessary.
Take advantage of any DDS features that can improve actions, whatever they may be.
What we need to know from you is, how well do actions work for you in ROS 1? What do you like and what do you hate? What would you change and what do you think must stay the same?
And of course please feel free to comment on or add to the design document or try out implementation ideas!
I’d like to add to @gbiggs that this is a very time sensitive request for input. This design needs to be solidified in the next 1-2 weeks in order to be done in time for the Crystal release, so if you have input, please take the opportunity to speak up now. Thanks!
Also, if you want to volunteer to help with the implementation, chime in here too. We may be able to do some divide-and-conquer. I was planning to help in the rclcpp layer for example.
We have a few people scheduled to dive into actions in the coming month. @dirk-thomas and @sloretz are good points of contact for coordinating efforts.
Actions are one of the most important parts of ROS1 but they always felt a bit like an afterthought since they were implemented on top of topics and created a kind of namespace polution, e.g. visualization tools like rqt_graph optionally group the goal/feedback/status/cancel topics to hide this mess. On the other hand for debugging purposes it is very handy to be able to inspect the action communication between action clients and servers.
Here is my design feedback:
It should be clearly detectable by clients that an action server node crashed and was restarted. I implemented a work-around for this for our own action client libs (see in .net [1] and lua [2]) by detecting non-monotonously increasing sequence numbers of messages received on the status topic. Topic disconnect callbacks could unfortunately not be used in ROS1 for quick detection of disconnects since they were executed only after a pretty long timeout period. Maybe in ROS2 this can be done in a clean way.
During load tests of a action client implementation I observed a rare situation in which a goal was submitted but never actually reflected by the server. It is a bit artificially created but it can happen that a new action server crashes directly after a goal is submitted and then restarts but the non-monotonously seq check of the client (as described above) does not trigger since the async status topic subscription misses one of the first status messages, e.g. the client sees only increasing status messages while the newly created action server never mentions the goal. In my .net impl I added a check for this situation by counting status messages that are missing a goal waiting for ack, see [3]. If I remember correctly the caller_id of the status connect callback could not be used to detect the action server restart. Maybe also a problem of weak server-process-identity.
In general the initial handshake of ROS1 actionlib and the client/server identity handling is too brittle, e.g. very strange situations can be created if two or more action servers are started under the same name (topics).
It would be great if a new connection to an action server could be established within milliseconds. In ROS1 due to the async pub/sub mechanism and the dependency on the first status-message (server identity) the maximum connection frequency is pretty low, e.g. this is even noted in comments in the actionlib impl see for example wait_for_server().
From the server implementation perspective I remember that I made the mistake to call setAborted on a goal-handle before accepting it which did not work because setAborted must only be called in PREEMPTING or ACTIVE state. I am not sure if this complexity is really necessarly and to create an error when setAborted is called in PENDING state, e.g. setCanceled seems to be valid for most states.
Actions are one of the most important parts of ROS1 but they always felt a bit like an afterthought
As one of the original developers of actionlib, I can most definitely tell you that actionlib was an afterthought. Actionlib came much later (6-12 months-ish) than the development of topics & services. Eitan Marder-Eppstein and I weren’t sure whether actionlib would be useful beyond just move_base & the pr2_calibration stack, and there was a broader design decision to not force a complex, seemingly extraneous feature like actions to be part of the core ROS APIs, thus increasing the complexity of porting ROS to a new language. In hindsight, actions turned out to be pretty important to the ROS ecosystem
Design Feedback
I totally understand the benefits of keeping much of the internals of actionlib the same between ROS1 & ROS2 (ease of portability, easy of interoperability, etc). However, if you are considering making more drastic changes (which may be necessary given some of the additional features in the design doc), I’d seriously consider redefining the client state machine. As stands, implementing a SimpleActionClient is unnecessarily complex, since the SimpleActionClient states are not a straight reduction of the ActionClient states. This means that the simple action client needs to know about both the current action client state and the previous action client state in order to compute the current simple action client state (see diagram on the wiki actionlib/DetailedDescription - ROS Wiki).
Contributing
The Virtana team (www.virtanatech.com) has several early career roboticists who’d love to get more open source contributions under their belts. Once the ROS2 actionlib team gets more of the implementation & design firmed up, we’d love to pitch-in in places that might fit the “good first issue” label.
I agree that the client state machine should be refined, if possible. I’ve found that a concise state machine, like in the Simple Action Client, is useful for writing applications on top of. Maybe we can try to reconcile the Simple Action Client and underlying Action Client state machines.
On a similar note (echoing @andreaskoepf’s last point), the possible state transitions in the server seem more complex than they need to be. This leads to concurrency issues (mentioned on the wiki). It would be nice to have a more robust, and perhaps more intuitive, state machine for the server from a user perspective.
IMO, it would be ideal for the server and client to follow the same state machine, rather than having subtle differences as in the ROS 1 documentation.
@dirk-thomas and @sloretz, since @gerkey said you are coordinating efforts from the OSRF side, what is the status of the design and how can we move forward quickly?
I believe we need some clarity on the high level API’s and the state machine design. As I mentioned before, I’m willing to help in the action file parsing and message generation tools and the rclcpp layer if needed.
@mkhansen Currently we’re writing an update for @gbiggs design doc and examples of what the client library APIs should look like. Moving quickly is important to us too. I expect implementation will start in rcl in parallel with opening design and example PRs so that feedback is incorporated as we go.
I’d like to raise an issue that is perhaps not relevant to most but that you may not have considered: The current action client & server implementations are insanely slow with large work-loads. This is (probably) mostly an implementation issue, but please keep it in mind when conducting tests.
I have previously worked on planning servers used for computing robot arm motion plans and have needed these servers to service “highly parallel” work loads. Think about the Moveit planning server, but with the ability to submit many goals and to provide an ordering to them.
The applications I worked with required the computation of thousands of such plans. I used actions to track the motion plans through their life-cycle (e.g. pending, planning, done). If I opened 1000 plans, I would frequently see the planning finish within 5 seconds followed by up to 50 seconds of serialization as goals are passed back and forth.
Unfortunately, I can’t share this particular code but I can write a representative test case if you are interested.
Thanks to all for the interesting responses so far. Lots of good information on what to target for improvements.
@Jmeyer A representative test case would be awesome. We can use it to find the root cause in the current design, and we can port it to ROS2 and make sure the new implementation doesn’t suffer from the same bottleneck.
I was looking at the ROS2 D-release roadmap.
It looks like there are no items related to actions.
From what I’ve seen, currently actions lack two useful features: the possibility of using them with nodes composition and some simplified APIs, similar to the simple_action_server and simple_action_client of ROS1.
Not that I know of, but I know there are some strong feelings about such things. I for one would rather see the API well-documented along with some simple tutorials covering common cases as opposed to seeing those classes come back. I’ve seen them cause a lot of confusion (e.g. those hidden threads bite a lot of folks).
I believe actions are usable as part of node composition. I’ve created an example here:
Although, I’ll admit that there is some refactoring that can be done to make things more user-friendly. For example, ros2/rclcpp#635.
Regarding “simple” actions, I don’t think it is very difficult for a user to achieve the desired behaviour with the current implementation (see rclpy “single goal” example). I can see the benefit of wrapping the boiler-plate code into convenient classes for “simple” actions, but I don’t think it is a high priority.
I agree with @kyrofa that tutorials would be a good thing to have for Dashing. I’ve recently created a ticket on GitHub to discuss ideas and track progress.
@jacob I’m trying to understand ros actionlib and finding a way to implement fire and forget methods. I cannot use rosservices for this as its a blocking call. can you enlighten me with your view on “whether rosaction shall be used to implement fire and forget mechanism using SimpleActionClient and SimpleActionServer?”
You might want to ask your question on answers.ros.org instead. This thread is about actions in ROS 2. And both - action as well as service - in ROS 2 are providing an asynchronous API which means they are not blocking.