Thanks for the summary @wjwwood @carlossv. Here some thoughts and questions from my side.
Besides the better handling of the timers with this proposal, the benefits of the event queue are the naturally given temporal order of the events in the queue and that there is no need to iterate over all entities to check for work, The reason for not having a work queue is that we would loose the history cache of DDS. This functionality would have to be re-implemented on higher layers when needed. The history cache is preserved with the event queue but it is also the source for workarounds.
With the history queue and the event queue there are now two queues that are accessed at different times and that behave in a different way. When the history queue in DDS overflows and drops samples the number of available samples in DDS and the number of events in the event queue diverge. This destroys the temporal order and leads to “empty” takes from rmw. E.g. if you are spinning with 10 Hz and want to process the latest greatest velocity that comes with 100 Hz, the history QoS can be set to 1. This would result in around 9 empty takes and also 9 unnecessary pushes to the event queue with expensive mutex operations and eventually context switches. Maybe there are better ways to solve this use case?
We discussed the idea of having an ID that indicates the specific sample connected with the event. Then you tell the rmw layer with a take which sample you want to have and the temporal order can be preserved. This is dangerous as in the worst case timing all samples you want to take are no more available and you will get no samples at all. Another option to preserve the temporal order would be to first take everything from rmw and sort it before executing the callbacks.
The “empty” events can be avoided with the discussed dirty flag. But I have the feeling that we would start building something similar to a waitset just to save one or two iterations over the entities that remain with an optimized waitset approach.
One use case I, and maybe others, see is to have one or a few triggers that shall wake up the spin and the majority of the entities are then processes according to what is available when the trigger comes. E.g. an image topic is the trigger and from the odometry topic the latest greatest shall be processed once the image arrived. I feel that the waitset approach could be extended to support this. You can argue that this is also something that can be put on top of the event queue but the consequence would be even more unnecessary events and context switches.
The proposal seems to have a great performance boost compared to the current waitset implementation. Since the proposal is on the table, this could lead to an immediate improvement. But I also feel that there are different goals like responsiveness and determinism that are hard to combine with one approach.
Would it make sense to have an event queue executor that focuses on performance and does no further things to really guarantee temporal order etc., and to have in future another optimized waitset based implementation? This then focuses more on determinism with guaranteed temporal order or user defined order and could also be used in scenarios where only some of the entities are triggering, like I describe above. Maybe it could even be combined with polling subscriptions?
Sorry for the lengthy post 