GroundingDINO
Open-world detector
An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
7k stars
42 watching
685 forks
Language: Python
last commit: 3 months ago
Linked from 1 awesome list
object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Related projects:
Repository | Description | Stars |
---|---|---|
idea-research/dino | An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. | 2,258 |
facebookresearch/dinov2 | A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. | 9,211 |
theshadow29/zsgnet-pytorch | An implementation of a computer vision model that grounds objects in images using natural language queries. | 69 |
jhcho99/coformer | An implementation of a deep learning model for grounding situation recognition in images | 43 |
tencentarc/gfpgan | An algorithm for restoring damaged or obscured faces in images | 35,898 |
amdegroot/ssd.pytorch | An implementation of a deep learning-based object detection system in PyTorch. | 5,146 |
cszn/kair | Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,957 |
doubiiu/dynamicrafter | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,580 |
junyanz/interactive-deep-colorization | A system for automatically colorizing black and white images with user interactions. | 2,694 |
thu-mig/yolov10 | Real-time object detection using a neural network architecture | 9,936 |
huawei-noah/efficient-ai-backbones | A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. | 4,054 |
layumi/person_reid_baseline_pytorch | A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures | 4,126 |
roboflow/notebooks | A collection of tutorials and examples on using various computer vision models and techniques. | 5,547 |
devendrachaplot/deeprl-grounding | Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. | 237 |
mlfoundations/open_flamingo | A framework for training large multimodal models to generate text conditioned on images or other text. | 3,742 |