bubogpt

Visual grounding framework

An open-source framework enabling multi-modal LLMs to jointly understand text, vision, and audio and ground knowledge into visual objects.

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

GitHub

505 stars

10 watching

35 forks

Language: Python

last commit: about 2 years ago

Screenshot of magic-research/bubogpt website

bubo-gpt.github.io/

Related projects:

Repository	Description	Stars
theshadow29/zsgnet-pytorch	An implementation of a computer vision model that grounds objects in images using natural language queries.	69
google-research/visu3d	An abstraction layer between various deep learning frameworks and your program.	149
hxyou/idealgpt	A deep learning framework for iteratively decomposing vision and language reasoning via large language models.	32
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
jy0205/lavit	A unified framework for training large language models to understand and generate visual content	544
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259
jhcho99/coformer	An implementation of a deep learning model for grounding situation recognition in images	45
penghao-wu/vstar	PyTorch implementation of guided visual search mechanism for multimodal LLMs	541
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
davidmascharka/tbd-nets	An open-source implementation of a deep learning model designed to improve the balance between performance and interpretability in visual reasoning tasks.	348
asappresearch/flambe	An ML framework for accelerating research and its integration into production workflows	264
nvlabs/bongard-hoi	A benchmarking tool and software framework for evaluating few-shot visual reasoning capabilities in computer vision models.	64
aifeg/benchlmm	An open-source benchmarking framework for evaluating cross-style visual capability of large multimodal models	84
s-gupta/visual-concepts	This codebase provides a framework for detecting visual concepts in images by leveraging image captions and pre-trained models.	150