bubogpt
Visual grounding framework
An open-source framework enabling multi-modal LLMs to jointly understand text, vision, and audio and ground knowledge into visual objects.
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
502 stars
10 watching
35 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
theshadow29/zsgnet-pytorch | An implementation of a computer vision model that grounds objects in images using natural language queries. | 69 |
google-research/visu3d | An abstraction layer between various deep learning frameworks and your program. | 147 |
hxyou/idealgpt | A deep learning framework for iteratively decomposing vision and language reasoning via large language models. | 32 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
tianyi-lab/hallusionbench | An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 243 |
jhcho99/coformer | An implementation of a deep learning model for grounding situation recognition in images | 43 |
penghao-wu/vstar | PyTorch implementation of guided visual search mechanism for multimodal LLMs | 527 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
davidmascharka/tbd-nets | An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks. | 348 |
asappresearch/flambe | An ML framework for accelerating research and its integration into production workflows | 262 |
nvlabs/bongard-hoi | A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models. | 64 |
aifeg/benchlmm | An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models | 83 |
s-gupta/visual-concepts | This codebase provides a framework for detecting visual concepts in images by leveraging image captions and pre-trained models. | 151 |