OMG-Seg

Visual Model

Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

GitHub

1k stars

22 watching

50 forks

Language: Python

last commit: 11 months ago

Related projects:

Repository	Description	Stars
opengvlab/visionllm	A large language model designed to process and generate visual information	956
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
l0sg/relational-rnn-pytorch	An implementation of DeepMind's Relational Recurrent Neural Networks (Santoro et al. 2018) in PyTorch for word language modeling	245
deepcs233/visual-cot	A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts.	162
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
opennlg/openba	A pre-trained language model designed for various NLP tasks, including dialogue generation, code completion, and retrieval.	94
gt-vision-lab/vqa_lstm_cnn	A Visual Question Answering model using a deeper LSTM and normalized CNN architecture.	377
360cvgroup/360vl	A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities.	32
gordonhu608/mqt-llava	A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens.	101
openseg-group/openseg.pytorch	Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing.	1,191
airaria/visual-chinese-llama-alpaca	Develops a multimodal Chinese language model with visual capabilities	429
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
llava-vl/llava-plus-codebase	A platform for training and deploying large language and vision models that can use tools to perform tasks	717
tianyi-lab/hallusionbench	An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy	259