to download project base paper.

Abstract:

Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial
automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed
follow anything (FA n), is an open-vocabulary and multimodal model – it is not restricted to concepts seen at training time and can be applied to novel classes at inference time
using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FA n can detect and segment objects by matching
multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion
and object re-emergence. We demonstrate FA n on a real world robotic system (a micro aerial vehicle), and report its ability to seamlessly follow the objects of interest in a real time control loop. FA n can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of
6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage: https://github.com/alaamaalouf/FollowAnything. We
also encourage the reader the watch our 5-minutes explainer video.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *