GroundingDINO
Open-world detector
An implementation of an object detection model designed to work in open-world scenarios with the ability to detect and recognize objects based on language descriptions.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
7k stars
43 watching
711 forks
Language: Python
last commit: 5 months ago
Linked from 1 awesome list
object-detectionopen-worldopen-world-detectionvision-languagevision-language-transformer
Related projects:
Repository | Description | Stars |
---|---|---|
idea-research/dino | An implementation of a deep learning-based object detection model with improved anchor boxes for end-to-end detection tasks. | 2,295 |
facebookresearch/dinov2 | A PyTorch implementation of a self-supervised learning method for learning robust visual features without supervision. | 9,425 |
theshadow29/zsgnet-pytorch | An implementation of a computer vision model that grounds objects in images using natural language queries. | 69 |
jhcho99/coformer | An implementation of a deep learning model for grounding situation recognition in images | 45 |
tencentarc/gfpgan | An algorithm for restoring damaged or obscured faces in images | 36,009 |
amdegroot/ssd.pytorch | An implementation of a deep learning-based object detection system in PyTorch. | 5,160 |
cszn/kair | Image restoration toolbox with training and testing codes for various deep learning-based methods | 2,994 |
doubiiu/dynamicrafter | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,668 |
junyanz/interactive-deep-colorization | A system for automatically colorizing black and white images with user interactions. | 2,701 |
thu-mig/yolov10 | Real-time object detection using a neural network architecture | 10,116 |
huawei-noah/efficient-ai-backbones | A collection of efficient AI backbone architectures developed by Huawei Noah's Ark Lab. | 4,098 |
layumi/person_reid_baseline_pytorch | A PyTorch implementation of an Object Re-ID baseline with various training methods and architectures | 4,149 |
roboflow/notebooks | This repository contains tutorials and examples on using state-of-the-art computer vision models and techniques | 5,678 |
devendrachaplot/deeprl-grounding | Trains an RL agent to execute natural language instructions in a 3D environment using a combination of A3C and gated attention mechanisms. | 237 |
mlfoundations/open_flamingo | A framework for training large multimodal models to generate text conditioned on images or other text. | 3,781 |