SliME
Multimodal model developer
Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
143 stars
4 watching
7 forks
Language: Python
last commit: about 1 month ago Related projects:
Repository | Description | Stars |
---|---|---|
yuliang-liu/monkey | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
yfzhang114/llava-align | Debiasing techniques to minimize hallucinations in large visual language models | 75 |
xverse-ai/xverse-v-13b | A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. | 78 |
yfzhang114/mme-realworld | A multimodal large language model benchmark designed to simulate real-world challenges and measure the performance of such models in practical scenarios. | 86 |
xverse-ai/xverse-moe-a36b | Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 37 |
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
xverse-ai/xverse-7b | A multilingual large language model developed by XVERSE Technology Inc. | 50 |
eleutherai/polyglot | Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. | 476 |
neulab/pangea | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
umass-foundation-model/3d-llm | Developing a Large Language Model capable of processing 3D representations as inputs | 979 |
felixgithub2017/mmcu | Measures the understanding of massive multitask Chinese datasets using large language models | 87 |