GroundingDINO

Open-world detector

An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

GitHub

7k stars
42 watching
685 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list

object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
idea-research/dino An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. 2,258
facebookresearch/dinov2 A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. 9,211
theshadow29/zsgnet-pytorch An implementation of a computer vision model that grounds objects in images using natural language queries. 69
jhcho99/coformer An implementation of a deep learning model for grounding situation recognition in images 43
tencentarc/gfpgan An algorithm for restoring damaged or obscured faces in images 35,898
amdegroot/ssd.pytorch An implementation of a deep learning-based object detection system in PyTorch. 5,146
cszn/kair Image restoration toolbox with training and testing codes for various deep learning-based methods 2,957
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,580
junyanz/interactive-deep-colorization A system for automatically colorizing black and white images with user interactions. 2,694
thu-mig/yolov10 Real-time object detection using a neural network architecture 9,936
huawei-noah/efficient-ai-backbones A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. 4,054
layumi/person_reid_baseline_pytorch A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures 4,126
roboflow/notebooks A collection of tutorials and examples on using various computer vision models and techniques. 5,547
devendrachaplot/deeprl-grounding Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. 237
mlfoundations/open_flamingo A framework for training large multimodal models to generate text conditioned on images or other text. 3,742