Sterowanie autonomicznym bezzałogowym statkiem powietrznym z wykorzystaniem uczenia przez wzmacnianie

Article in Polish DOI: 10.14313/PAR_250/85

Paweł Miera , Hubert Szolc , send Tomasz Kryjak AGH Akademia Górniczo-Hutnicza im. S. Staszica, Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej, Laboratorium Systemów Wizyjnych, Zespół Wbudowanych Systemów Wizyjnych, al. Mickiewicza 30, Kraków, 30-059

Download Article

PDF file (pobrano 117 razy)

Streszczenie

Uczenie przez wzmacnianie ma coraz większe znaczenie w sterowaniu robotami, a symulacja odgrywa w tym procesie kluczową rolę. W obszarze bezzałogowych statków powietrznych (BSP, w tym dronów) obserwujemy wzrost liczby publikowanych prac naukowych zajmujących się tym zagadnieniem i wykorzystujących wspomniane podejście. W artykule omówiono opracowany system autonomicznego sterowania dronem, który ma za zadanie lecieć w zadanym kierunku (zgodnie z przyjętym układem odniesienia) i omijać napotykane w lesie drzewa na podstawie odczytów z obrotowego sensora LiDAR. Do jego przygotowania wykorzystano algorytm Proximal Policy Optimization (PPO), stanowiący przykład uczenia przez wzmacnianie (ang. reinforcement learning, RL). Do realizacji tego celu opracowano własny symulator w języku Python. Przy testach uzyskanego algorytmu sterowania wykorzystano również środowisko Gazebo, zintegrowane z Robot Operating System (ROS). Rozwiązanie zaimplementowano w układzie eGPU Nvidia Jetson Nano i przeprowadzono testy w rzeczywistości. Podczas nich dron skutecznie zrealizował postawione zadania i był w stanie w powtarzalny sposób omijać drzewa podczas przelotu przez las.

Słowa kluczowe

autonomiczne sterowanie, dron, Gazebo, ROS, uczenie przez wzmacnianie

Control of an Autonomous Unmanned Aerial Vehicle Using Reinforcement Learning

Abstract

Reinforcement learning is of increasing importance in the field of robot control and simulation plays a key role in this process. In the unmanned aerial vehicles (UAVs, drones), there is also an increase in the number of published scientific papers involving this approach. In this work, an autonomous drone control system was prepared to fly forward (according to its coordinates system) and pass the trees encountered in the forest based on the data from a rotating LiDAR sensor. The Proximal Policy Optimization (PPO) algorithm, an example of reinforcement learning (RL), was used to prepare it. A custom simulator in the Python language was developed for this purpose. The Gazebo environment, integrated with the Robot Operating System (ROS), was also used to test the resulting control algorithm. Finally, the prepared solution was implemented in the Nvidia Jetson Nano eGPU and verified in the real tests scenarios. During them, the drone successfully completed the set task and was able to repeatable avoid trees and fly through the forest.

Keywords

automatic control, dron, Gazebo, reinforcement learning, RL, ROS

Bibliography

Mandirola M., Casarotti C., Peloso S., Lanese I., Brunesi E., Senaldi I., Use of UAS for damage inspection and assessment of bridge infrastructures, “International Journal of Disaster Risk Reduction”, Vol. 72, 2022, DOI: 10.1016/j.ijdrr.2022.102824.
Ackerman E., Koziol M., The blood is here: Zipline’s medical delivery drones are changing the game in Rwanda, “IEEE Spectrum”, Vol. 56, No. 5, 2019, 24–31, DOI: 10.1109/MSPEC.2019.8701196.
Carabassa V., Montero P., Crespo M., Padró J.-C., Pons X., Balagué J., Brotons L., Alcañiz J.M., Unmanned aerial system protocol for quarry restoration and mineral extraction monitoring, “Journal of Environmental Management”, Vol. 270, 2020, DOI: 10.1016/j.jenvman.2020.110717.
Roldán J.J., Garcia-Aunon P., Peña-Tapia E., Barrientos A., SwarmCity Project: Can an Aerial Swarm Monitor Traffic in a Smart City?, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), 2019, 862–867, DOI: 10.1109/PERCOMW.2019.8730677.
Koval A., Kanellakis C., Vidmark E., Haluska J., Nikolakopoulos G., A Subterranean Virtual Cave World for Gazebo based on the DARPA SubT Challenge, CoRR, abs/2004.08452, 2020, DOI: 10.48550/arXiv.2004.08452.
Loon K.W., Graesser L., Cvitkovic M., SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning, arXiv, 2019. DOI: 10.48550/ARXIV.1912.12482.
Muzahid A.J., Kamarulzaman S.F., Rahman A., Comparison of PPO and SAC Algorithms Towards Decision Making Strategies for Collision Avoidance Among Multiple Autonomous Vehicles, 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021, 200–205. DOI: 10.1109/ICSECS52883.2021.00043.
Jagannath J., Jagannath A., Furman S., Gwin T., Deep Learning and Reinforcement Learning for Autonomous Unmanned Aerial Systems: Roadmap for Theory to Deployment, arXiv, 2020, DOI: 10.48550/ARXIV.2009.03349.
Rodriguez-Ramos A., Sampedro C., Bavle H., de la Puente P., Campoy P., A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform, “Journal of Intelligent & Robotic Systems”, Vol. 93, No. 1, 2019, 351–366. DOI: 10.1007/s10846-018-0891-8.
Song Y., Steinweg M., Kaufmann E., Scaramuzza D., Autonomous Drone Racing with Deep Reinforcement Learning, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, DOI: 10.1109/IROS51168.2021.9636053.
Slamtec RPLIDAR-A2 Laser Range Scanner, [www.slamtec.com/en/ Lidar/A2].
Bellman R., Dynamic programming, Princeton University Press, 1957.
Raffin A., Hill A., Gleave A., Kanervisto A., Ernestus M., Dormann N., Stable-Baselines3: Reliable Reinforcement Learning Implementations, “Journal of Machine Learning Research”, Vol. 22, 2021, 1–8. [http://jmlr.org/papers/v22/20-1364.html].