A very cool project would be to extend this and try a few policy gradient methods. You could even go ahead an compare it with the value iteration one you just tried (DQN). I did a while ago a tutorial comparing different methods for a simple environment but yours is indeed much cooler.
I just released https://github.com/TensorSwarm/TensorSwarm which allows you to over 100 robots at the same time. At the moment Proximal Policy Optimization is used as it seems to provide the best results.
The backend is a ROS service so you should be able to adopt your robots pretty easily. I’d be also glad to provide you with some support.
I’m studying on using reincorcement learning for turtlebot3. And I tried various reinfocement algorithm with gym-gazebo. Next step try on real turtlebot3 not simulation. Have you tried it on a real Turtlebot3?
Hello @Ozdenur_Ucar, glad to hear you’ve been using gym-gazebo. We are about to release a few additional algorithms so in case you’re interested, I’d suggest to stay tuned for them!
Hi everyone
I am currently training turtle bot 3 using DQN from ROBOTIS.com. I will later do real- world simulation. My current simulation on gazebo shows a lot of collision for the turtlebot. Is there anywhere I could reduce the collisions? Or do I have to use a different optimizer?