SliME

Multimodal model developer

Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

GitHub

137 stars
4 watching
7 forks
Language: Python
last commit: 14 days ago

Related projects:

Repository Description Stars
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
yfzhang114/llava-align Debiasing techniques to minimize hallucinations in large visual language models 71
xverse-ai/xverse-v-13b A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences. 77
yfzhang114/mme-realworld A benchmark dataset designed to evaluate the performance of multimodal large language models in realistic, high-resolution real-world scenarios. 78
xverse-ai/xverse-moe-a36b Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. 36
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
pleisto/yuren-baichuan-7b A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks 72
xverse-ai/xverse-7b A multilingual large language model developed by XVERSE Technology Inc. 50
eleutherai/polyglot Large language models designed to perform well in multiple languages and address performance issues with current multilingual models. 475
neulab/pangea An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts 91
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
umass-foundation-model/3d-llm Developing a Large Language Model capable of processing 3D representations as inputs 961
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87