ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

Discussion on detection of people and obstacles in a dense crowd


I’m not at the point of being able to ask a proper question yet. However, I am starting on a project that will hopefully work as a telepresence robot for a friend of mine who is housebound. This would, hopefully, allow her to attend science fiction conventions.

The problem is that science fiction conventions are extremely crowded and I don’t want the robot to be able to run into anybody or anything, at least no more than a human would in a similar situation.

The robot will have a lot of sensors to detect objects at various levels, but I think that vision is the best way to detect people in this situation. And I’m sure that it won’t help that many of the people will be in costume.

Does anybody have any ideas about how to approach this problem?

The planner to determine the route of the robot would include the above data, in addition to the goals of the remote user and also keeping me within a certain distance. But right now I am only concerned about detecting the “crowd.” Finding a way through the crowd is different problem. It is similar to the problem of a child in a similar circumstance with orders to keep their parents in sight and not to run into people.

Yes, it would be nice if the robot were fully autonomous. But actually I think a robot like this one would be teleoperated, that is moved with a joystick. So there is a human driver using the vision system remotely. Still there would be some lag and human drivers are not perfect. I think it might be good enough that if the robot detected an imminent collision would simply apply the brakes hard and stop. The human driver would then figure out what to do.

I’m using the common HC-SR04 as a last-resort collision sensor. In theory my robot should never crash as there are sensors to prevent this. But hc-sr04 is a backup. Inside the base controller there is a loop that runs once for every twist message. But the base controller looks at the distance reading from the ultrasonic sensors and decides if it “wants to” execute the twist message. This decision is made completely outside of ROS. The point is that we should only detect an obstacle if there is a failure in the ROS based system so the base controller, the process that actually commands the traction motors does the “pinging” itself and will refuse to power into a fixed object.

You may or may not want to use this same design but if operating in a crown and the robot owner is not within arm’s length with a hand ready to punch an “e-stop” bottom you need maybe TWO not one fail safe checks. In other words if the ROS planner says “go” it also must have an OK from TWO independent non-ROS based systems. Perhaps the second one is a mechanical switch that detects physical contact between a bumper bar and the obstacle. For your robot I’m thinking of a ring that encircles the robot and is held by springs and if a spring moves say 1/4 inch it means that your vision system, the humans driver and the ultra sonic sensors have all failed to detect the obstacle. So this system that uses mechanical switches disconnects the motors from power using a mechanical relay (no software in the loop)

Safety is hard. The easy task is to design the machine to be safe when it is operating as designed. The harder task that is 100% required to operate remotely in a crowd is that the machine remains safe even after unanticipated failure modes. As an example I had an electrical fire last week in a prototype. Lithium batteries have high power density and I ended up vaporizing a power cable. My design was not fail safe, in that obviously there was a failure mode that could cause a fire.

So your anti collision system must work even if there is a bug in the software and even if there is a mechanical fault. Vision is CLEARLY to complex to be unconditionally safe but could be a very good primary system given enough redundant backups.

Back to my work, I’d like to be able to use one camera but I’m undecided. getting 3D data from one camera requires very good motion estimation. I don’t think my IMU and odometry will be good enough so I may need stereo vision. I think in your mixed indoor/outdoor use case, moving in a crowd you will need stereo vision to get usable depth. Your obstacles are all moving and you will need to snap the pair of images simultaneously, not sequentially.

Summary: I think stereo vision is a great primary sensor. But for remote operation you’d better have multiple independent backups that are each so simple they can’t fail and being truly independent the chances of both failing at the same time is the product of the probabilities, a tiny number.

Telepresence seems easy at first, basically it is a remote control car with a webcam glued to the top but the problem is that telepresence by definition means the operator is not present. I think that implies extreme reliability and safety.

People tracking for people who don’t look like people all the time, interesting!. In a crowded environment no less.

You might look into the (who had a robot driving around on Amsterdam Schiphol Airport:

Following people in a less crowded environment is a task in RoboCup@Home. At TechUnited, we use vision to detect a person and then a laser scanner at torso-height to track the operator.

Thank you very much, Loy.

And I forgot to mention: while most of the people in the crowd are adults, there are many children and probably a few R2D2 robots. This is going to be an interesting project. I suppose I will start out with collisions in my motorhome where there is just me to damage. Then I will add in other factors by inviting other people in and putting random boxes down to simulate a changing environment.

Then I might bring Groucho outside and see how he handles there, with me close by with a remote cut-off switch. Unfortunately, I have to carry him in and out of the motorhome. This puts an upper limit on his size and weight, though I will make him able to be split into multiple pieces if possible.


I agree about the need for safety.

In some of my previous robots, I had a “reflex system” that handled some failure cases. This was a simple processor that took in input from certain sensors and stopped the robot and then sent a message to the planner about why it had stopped. Until the planner sent the reflex processor a “chill out” message, the robot didn’t move.

The reflex processor killed the motors with relays so that nothing could turn them on until the reflex processor was satisfied.

I used touch-sensors and floor sensors to make sure the robot wasn’t heading over some stairs.

These were very simple robots compared to Groucho and they didn’t use ROS.

I usually design my robots’ brains with multiple functions/subsystems. The reflex processor is the one that is guaranteed to be a separate processor that is very simple.

I will use multiple segments to my touch-ring. This will give me a better idea of where the robot was touched the object. I may even have certain touch-sensors control the motor relays themselves after a given amount of pressure.

I have two Intel RealSense cameras that I plan to use for Groucho. I also have six webcams coming so I’ll be able to handle using stereo vision if need be. I am still designing Groucho’s head. I want it to look metallic while also looking like a caricature of Grouch Marx. Plus I will take some ideas from Flash Robot. This is the most expressive head I’ve seen while still looking like a robot. I have 12 Dynamixel AX-12 servos that can be used for just the head if need be.

There will be at least one camera in the head for additional vision at a taller height. There will be at least 3 webcams in Groucho, in addition to the two RealSense cameras.

I’ll be using an Intel NUC (latest generation) for Groucho’s main brain. I may preprocess some of the cameras with another computer if that is needed. Or perhaps use a second NUC for vision processing. At this point, I haven’t done much vision processing so I will have to do more before I can say anything. The base will have enough room for several processors.

And I think I’ll add some temperature sensors for the lithium batteries to the refelx processor. I thank you for sharing that story with me.

DangerousThink, AKA DT, Jay

The Jack Robbot program at Stanford is using ROS to understand the robot SLAM in social situations. For example, A person would never walk between to other people having a conversation.