GroundingDINO
Open-world detector
An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
7k stars
43 watching
711 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list
object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. | 2,295 |
| | A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. | 9,425 |
| | An implementation of a computer vision model that grounds objects in images using natural language queries. | 69 |
| | An implementation of a deep learning model for grounding situation recognition in images | 45 |
| | An algorithm for restoring damaged or obscured faces in images | 36,009 |
| | An implementation of a deep learning-based object detection system in PyTorch. | 5,160 |
| | Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,994 |
| | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,668 |
| | A system for automatically colorizing black and white images with user interactions. | 2,701 |
| | Real-time object detection using a neural network architecture | 10,116 |
| | A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. | 4,098 |
| | A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures | 4,149 |
| | This repository contains tutorials and examples on using state-of-the-art computer vision models and techniques | 5,678 |
| | Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. | 237 |
| | A framework for training large multimodal models to generate text conditioned on images or other text. | 3,781 |