GroundingDINO
Open-world detector
An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
7k stars
43 watching
711 forks
Language: Python
last commit: 6 months ago
Linked from 1 awesome list
object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Related projects:
Repository | Description | Stars |
---|---|---|
| An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. | 2,295 |
| A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. | 9,425 |
| An implementation of a computer vision model that grounds objects in images using natural language queries. | 69 |
| An implementation of a deep learning model for grounding situation recognition in images | 45 |
| An algorithm for restoring damaged or obscured faces in images | 36,009 |
| An implementation of a deep learning-based object detection system in PyTorch. | 5,160 |
| Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,994 |
| This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,668 |
| A system for automatically colorizing black and white images with user interactions. | 2,701 |
| Real-time object detection using a neural network architecture | 10,116 |
| A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. | 4,098 |
| A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures | 4,149 |
| This repository contains tutorials and examples on using state-of-the-art computer vision models and techniques | 5,678 |
| Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. | 237 |
| A framework for training large multimodal models to generate text conditioned on images or other text. | 3,781 |