co-tracker

Point tracker

A model for tracking any point on a video using transformer-based architecture and optical flow benefits

CoTracker is a model for tracking any point (pixel) on a video.

GitHub

4k stars
34 watching
250 forks
Language: Jupyter Notebook
last commit: 29 days ago
optical-flowpoint-trackingtrack-anything

Related projects:

Repository Description Stars
visionml/pytracking A comprehensive framework for building and training visual object tracking models using PyTorch. 3,247
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,580
facebookresearch/segment-anything This project provides code and tools for running inference with a visual segmentation model that can generate object masks from input prompts. 47,627
facebookresearch/sam2 An open-source software project providing code and tools for running inference with a deep learning model designed for visual segmentation in images and videos. 12,353
eduardolundgren/tracking.js A computer vision library for the web that provides various algorithms and techniques for tracking objects and features in images. 9,443
facebookresearch/slowfast Provides state-of-the-art video understanding codebase with efficient training methods and pre-trained models for various tasks 6,623
clementpinard/sfmlearner-pytorch PyTorch implementation of unsupervised depth and ego-motion learning from video sequences 1,014
doubiiu/tooncrafter Generates cartoon-style videos from two images using pre-trained diffusion models 5,353
facebookresearch/dinov2 A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. 9,211
mkocabas/vibe A video pose and shape estimation method that predicts body parameters for each frame of an input video. 2,897
facebookresearch/detectron2 A platform for object detection and segmentation tasks using machine learning algorithms 30,539
benaclejames/vrcfacetracking Provides eye and lip tracking functionality for VRChat avatars using OSC protocol and custom parameter mappings 627
thudm/cogvideo Generates videos from text and images using large language models 9,156
facebookresearch/imagebind An AI framework that combines data from multiple sources into a single embedding space, enabling various applications such as cross-modal retrieval and generation. 8,362
martinruenz/co-fusion Enables a robot to maintain scene descriptions at the object level by segmenting and tracking objects in real-time from RGB-D images. 501