TelloHand

Written by Alessio on 3/15/2023

Sight to Action

During my studies at ITI "Enrico Medi", I developed TelloHand, a system to control a DJI Tello drone via computer vision. Through a Python pipeline connecting the drone's camera to the processing device, I implemented flight control based on gesture recognition. The main goal was to optimize Computer Vision algorithms and reduce latency to ensure real-time commands.

Technical Architecture

The DJI Tello exposes a UDP video stream and accepts string-based commands, requiring a highly efficient software architecture to translate visual inputs into flight commands.

Building the Pipeline

The software pipeline intercepts and decodes the H.264 video stream from the drone over Wi-Fi. For hand tracking, I integrated Google's MediaPipe library, using its CPU-optimized inference. I implemented geometric logic to interpret finger landmarks, translating them into specific commands at 30 frames per second to ensure smooth responsiveness.

Managing Latency

In real-time systems, latency compromises flight safety. To minimize delays, I optimized the frame buffer and implemented a multi-threading architecture, effectively reducing input lag to near zero through optimized Python and OpenCV code.

References

Here is the source code for the project:

https://github.com/kairosci/tello-hand