Teleoperation Design (#602)

Posted by @cnboonhan:

I would like to drop a discussion here on possible ideas we can keep in mind should we want to add teleoperation capabilities to rmf-web. Teleoperation in this case would involve some sort of view of the robot, such as a camera feed, or perhaps a view of its navigation map. Here are some possible benefits of remote control:

  1. unstuck delivery robots that are blocked by temporary obstructions during their runs and become obstructions
  2. finely reposition telepresence robots ( for example rotating the robot slightly to improve view angles )
  3. get useful debug information without moving physically to the robot location
  4. allow robots that do not / have not set up autonomous navigation to still be used with a human operator

Documented some ideas for architecture here: https://drive.google.com/file/d/1rc5dOl3ec3SUQaf2dcFvpaXiZ5QPTlZJ/view?usp=sharing with a brief example of the UI here: https://www.figma.com/proto/2S5c0468vvSquBcSg7slgY/rmf-web-telepresence?node-id=11197%3A9757&scaling=contain&page-id=0%3A1&starting-point-node-id=10686%3A4165&show-proto-sidebar=1

(note that in the figma, i only have the protoype implemented for dark mode)

and would love to get feedback.

Posted by @youliangtan:

This feature looks helpful. With limited knowledge, the architecture looks feasible too. One side note, we will need to ensure that rmf knows the robot is in “TELEOP_MODE” and gives up the control of the robot to the human operator. This happens in process (2) of the diagram.

Posted by @koonpeng:

Thanks for the proposal, some questions which I have:

Teleoperation view

  • In step 3, how would the api server know the url hosted by the media server?
  • In step 4, what is the protocol of the control url?
  • In the teleoperation control part, it is mentioned that the client connects to the api server which translates the teleoperation commands, but in the teleoperation view diagram, it shows the client connecting directly to the robot.
  • Do we need the type field in the schemas? Is it enough to identify the type based on the url, i.e. should we share one url for all teleoperation apis, or put each “type” of request into different urls?
  • How would the media server work with the robot to ensure it only streams something if there is at least one operator connected?

Teleoperation control

  • How would it work if multiple operators attempt to control the same robot?
  • Both the operator and the robot joins the same socketio room, how would the api server identify between operators and robots?

On another note, in keeping with the microservice architecture that rmf api messages is moving towards, would it be possible for the “api server” mentioned here to be it’s own standalone application?

Posted by @cnboonhan:

Thanks for your input! The responses below assume we are changing the rmf-web api-server, but i’m entirely open ( and think it might be a good idea ) to be its own standalone app.

  • In step 3, how would the api server know the url hosted by the media server?
    It seems a straightforward way could be to extend the api server AppConfig with an additional dictionary of links. This is probably good for a first attempt, but will be static and unable to account for changes. Eventually it would probably be ideal to have a secondary mechanism for such dynamic cahnges. I have not thought too deep into this, thanks for bringing it up. I think it will be necessary to better understand how the media server will work in creating webrtc “rooms”, in order to properly design the URL “lookup” mechanism that builds on it.
  • In step 4, what is the protocol of the control url?
    Was thinking of using the FastIO libraries in the api-server, and reuse the api-client Swagger bindings on the Adapter process.
  • In the teleoperation control part, it is mentioned that the client connects to the api server which translates the teleoperation commands, but in the teleoperation view diagram, it shows the client connecting directly to the robot.
    Yes, i was trying to think how we could represent handling different teleoperation “views”, I didn’t spend enough time thinking how to represent this clearly. The idea was that we could have different mechanisms to teleoperate. Some robots may not have a camera view available at all, but we might want to allow a robot-specific way to teleoperate. For example, I know that the MiR has a web dashboard which has a nice view of its navigation map, and a joystick for teleoperation. It could be possible to serve this dashboard directly through the TeleoperationView in order to capitalize on this. In this case, we would probably bypass the entire webRTC stack, as we are not streaming a camera feed ( it doesn’t exist / not available ).
  • Do we need the type field in the schemas? Is it enough to identify the type based on the url, i.e. should we share one url for all teleoperation apis, or put each “type” of request into different urls?
    I admit i blatantly copy pasted examples from the rmf_api_msgs schemas to try and be as consistent as possible, so I might have misunderstood the schema intended meaning. I understand the “types” as a way to identify the intended encoding of the string in the JSON, which is of type “String”. I feel I might be misunderstanding your question?
  • How would the media server work with the robot to ensure it only streams something if there is at least one operator connected?
    I was hoping the media server would be intelligent about it. :crossed_fingers:

Teleoperation control

  • How would it work if multiple operators attempt to control the same robot?
    That is a great question, I don’t know whats best. I am tempted to allow a free-for-all :laughing: . The issue I have with making a lock mechanism is that I am afraid of many usability issues. When does a lock release, what happens if that release fails to reach the server, timeout issues, etc, it becomes hard to test and predict deployment situations, for the benefit it provides. Since only authorized users can control the robots ( in theory ) it could be sufficient to perhaps simply report who was the “last operating user” so anyone with any beef can just send a strongly worded email.
  • Both the operator and the robot joins the same socketio room, how would the api server identify between operators and robots?
    I would like to simplify this by saying that it might be fine if we didn’t differentiate them, since the bandwidth required should hopefully not reach a point of causing issues, and each endpoint can ignore whatever data isn’t found useful. Perhaps this assumption is flawed?

On another note, in keeping with the microservice architecture that rmf api messages is moving towards, would it be possible for the “api server” mentioned here to be it’s own standalone application?
I think that would be a great idea to keep a separation of concerns. I would like to keep architecture / tools as similar as possible as it seems we can reuse many tools for a more uniform code base, though, as i think many technologies in the api-server are highly relevant and makes everything more maintainable.

Posted by @koonpeng:

In step 3, how would the api server know the url hosted by the media server?

It seems a straightforward way could be to extend the api server AppConfig with an additional dictionary of links. This is probably good for a first attempt, but will be static and unable to account for changes. Eventually it would probably be ideal to have a secondary mechanism for such dynamic cahnges. I have not thought too deep into this, thanks for bringing it up. I think it will be necessary to better understand how the media server will work in creating webrtc “rooms”, in order to properly design the URL “lookup” mechanism that builds on it.

I would avoid using config to specify the url, it requires knowledge of how the media server works, and it does not allow dynamically created rooms.

In step 4, what is the protocol of the control url?

Was thinking of using the FastIO libraries in the api-server, and reuse the api-client Swagger bindings on the Adapter process.

In the teleoperation control part, it is mentioned that the client connects to the api server which translates the teleoperation commands, but in the teleoperation view diagram, it shows the client connecting directly to the robot.

Yes, i was trying to think how we could represent handling different teleoperation “views”, I didn’t spend enough time thinking how to represent this clearly. The idea was that we could have different mechanisms to teleoperate. Some robots may not have a camera view available at all, but we might want to allow a robot-specific way to teleoperate. For example, I know that the MiR has a web dashboard which has a nice view of its navigation map, and a joystick for teleoperation. It could be possible to serve this dashboard directly through the TeleoperationView in order to capitalize on this. In this case, we would probably bypass the entire webRTC stack, as we are not streaming a camera feed ( it doesn’t exist / not available ).

If the dashboard is using the api-client library, that means the robot/teleop adapter has to support a standardized teleop protocol, if so, I don’t see the need of the 2nd flow, which goes through the api-server. For retrieving the mir web dashboard and showing it on the rmf dashboard, there are a number of problems I see which are not addressed in this proposal.

  • How does the rmf dashboard know the url of the robot dashboard?
  • How can the rmf dashboard log into the robot dashboard?
  • The robot dashboard may need cookies for session/logins etc, assuming we are using iframes, they would be 3rd party cookies which are blocked by default on safari and may not work on other browsers if the robot dashboard did not set their cookies correctly.

Do we need the type field in the schemas? Is it enough to identify the type based on the url, i.e. should we share one url for all teleoperation apis, or put each “type” of request into different urls?

I admit i blatantly copy pasted examples from the rmf_api_msgs schemas to try and be as consistent as possible, so I might have misunderstood the schema intended meaning. I understand the “types” as a way to identify the intended encoding of the string in the JSON, which is of type “String”. I feel I might be misunderstanding your question?

The schemas provided have the string with a fixed constant, so the field is redundant and can be implied from the request endpoint. i.e. If the server receives a request on /teleoperation_view_request, it already knows the payload is a Teleoperation View Request json, it doesn’t need the type field to identify the payload.

Both the operator and the robot joins the same socketio room, how would the api server identify between operators and robots?

I would like to simplify this by saying that it might be fine if we didn’t differentiate them, since the bandwidth required should hopefully not reach a point of causing issues, and each endpoint can ignore whatever data isn’t found useful. Perhaps this assumption is flawed?

Bandwidth aside, I can think of some scenarios where the server need to know the type of client to provide certain functionalities.

  • Detecting / reporting if the teleop control is available.
  • Security
    • We don’t want other adapters to listen in on the commands sent to other robots (probably ok for same fleet, but not on another fleet).
    • We also do not want teleop adapters to be able to submit tasks, open doors etc.

Posted by @cnboonhan:

I try to prune the nested responses, hopefully the formatting and context is ok

I would avoid using config to specify the url, it requires knowledge of how the media server works, and it does not allow dynamically created rooms.
That’s a good point, I would avoid this too for the long term. Will think of a better way once I have some more experience with media server internals.

If the dashboard is using the api-client library, that means the robot/teleop adapter has to support a standardized teleop protocol, if so, I don’t see the need of the 2nd flow, which goes through the api-server. For retrieving the mir web dashboard and showing it on the rmf dashboard, there are a number of problems I see which are not addressed in this proposal.

* How does the rmf dashboard know the url of the robot dashboard?

* How can the rmf dashboard log into the robot dashboard?

* The robot dashboard may need cookies for session/logins etc, assuming we are using iframes, they would be 3rd party cookies which are blocked by default on safari and may not work on other browsers if the robot dashboard did not set their cookies correctly.

Agreed. I was in fact trying to get at two “modes” of teleoperation, hence the dotted lines and seemingly “dual paths”. A basic standard API was exist using FastIO/api-client which covers on basic WASD movement and speed controls. To cover more “advanced” features, these would be too variable to generalize. In that case, i was thinking of the discussed mechanism of loading a custom teleoperation page. We would have to make some assumptions /constraints on such a custom page in order to get anywhere:

  • It probably has to be easily accessible over a URL
  • There must be some mechanism to allow a third party to view this in the dashboard. Alternatively, a redirect to open a new page could be possible.
  • Similarly to the first point, you rightly point out that there should ideally be a standard mechanism for robots / teleoperation servers to dynamically “report” their access details like URL or webRTC room details, for clients to query, in the longer term. Will gather more information and improve on this as I learn more and prototype. They should ideally be bundled together, i think.

The schemas provided have the string with a fixed constant, so the field is redundant and can be implied from the request endpoint. i.e. If the server receives a request on /teleoperation_view_request, it already knows the payload is a Teleoperation View Request json, it doesn’t need the type field to identify the payload.

I think I understand, will keep this in mind. I would like to be as consistent as possible with the other messages in rmf_api_msgs, will follow up on this.

Bandwidth aside, I can think of some scenarios where the server need to know the type of client to provide certain functionalities.

* Detecting / reporting if the teleop control is available.

* Security
  
  * We don't want other adapters to listen in on the commands sent to other robots (probably ok for same fleet, but not on another fleet).
  * We also do not want teleop adapters to be able to submit tasks, open doors etc.

You are right. I’m thinking we might want to use separate room for teleoperation, for starters and a blanket security step. I did not consider fully that even in a separate room that we might not want just anyone to listen in on the teloperation commands of other robots. Naively speaking I think I will want to analyze more closely the authorization framework (authz) and try to follow / use it as much as possible?