GroundingDINO

Open-world detector

An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

GitHub

7k stars
43 watching
711 forks
Language: Python
last commit: 5 months ago
Linked from 1 awesome list

object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
idea-research/dino An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. 2,295
facebookresearch/dinov2 A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. 9,425
theshadow29/zsgnet-pytorch An implementation of a computer vision model that grounds objects in images using natural language queries. 69
jhcho99/coformer An implementation of a deep learning model for grounding situation recognition in images 45
tencentarc/gfpgan An algorithm for restoring damaged or obscured faces in images 36,009
amdegroot/ssd.pytorch An implementation of a deep learning-based object detection system in PyTorch. 5,160
cszn/kair Image restoration toolbox with training and testing codes for various deep learning-based methods 2,994
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,668
junyanz/interactive-deep-colorization A system for automatically colorizing black and white images with user interactions. 2,701
thu-mig/yolov10 Real-time object detection using a neural network architecture 10,116
huawei-noah/efficient-ai-backbones A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. 4,098
layumi/person_reid_baseline_pytorch A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures 4,149
roboflow/notebooks This repository contains tutorials and examples on using state-of-the-art computer vision models and techniques 5,678
devendrachaplot/deeprl-grounding Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. 237
mlfoundations/open_flamingo A framework for training large multimodal models to generate text conditioned on images or other text. 3,781