OMG-Seg
Visual Model
Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
1k stars
22 watching
50 forks
Language: Python
last commit: 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A large language model designed to process and generate visual information | 956 |
| A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
| Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
| An implementation of DeepMind's Relational Recurrent Neural Networks (Santoro et al. 2018) in PyTorch for word language modeling | 245 |
| A framework for training multi-modal language models with a focus on visual inputs and providing interpretable thoughts. | 162 |
| An open-source implementation of a vision-language instructed large language model | 513 |
| A pre-trained language model designed for various NLP tasks, including dialogue generation, code completion, and retrieval. | 94 |
| A Visual Question Answering model using a deeper LSTM and normalized CNN architecture. | 377 |
| A large multi-modal model developed using the Llama3 language model, designed to improve image understanding capabilities. | 32 |
| A vision-language model that uses a query transformer to encode images as visual tokens and allows flexible choice of the number of visual tokens. | 101 |
| Provides a PyTorch implementation of several computer vision tasks including object detection, segmentation and parsing. | 1,191 |
| Develops a multimodal Chinese language model with visual capabilities | 429 |
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
| An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 259 |