TelloHand
Sight to Action
During my studies at ITI "Enrico Medi", I developed TelloHand, a system to control a DJI Tello drone via computer vision. Through a Python pipeline connecting the drone's camera to the processing device, I implemented flight control based on gesture recognition. The main goal was to optimize Computer Vision algorithms and reduce latency to ensure real-time commands.
Technical Architecture
The DJI Tello exposes a UDP video stream and accepts string-based commands, requiring a highly efficient software architecture to translate visual inputs into flight commands.
Building the Pipeline
The software pipeline intercepts and decodes the H.264 video stream from the drone over Wi-Fi. For hand tracking, I integrated Google's MediaPipe library, using its CPU-optimized inference. I implemented geometric logic to interpret finger landmarks, translating them into specific commands at 30 frames per second to ensure smooth responsiveness.
Managing Latency
In real-time systems, latency compromises flight safety. To minimize delays, I optimized the frame buffer and implemented a multi-threading architecture, effectively reducing input lag to near zero through optimized Python and OpenCV code.
References
Here is the source code for the project:
https://github.com/kairosci/tello-hand