SliME

Multimodal model developer

Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

GitHub

143 stars

4 watching

7 forks

Language: Python

last commit: 11 months ago

Related projects:

Repository	Description	Stars
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
yfzhang114/llava-align	Debiasing techniques to minimize hallucinations in large visual language models	75
xverse-ai/xverse-v-13b	A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.	78
yfzhang114/mme-realworld	A multimodal large language model benchmark designed to simulate real-world challenges and measure the performance of such models in practical scenarios.	86
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
xverse-ai/xverse-7b	A multilingual large language model developed by XVERSE Technology Inc.	50
eleutherai/polyglot	Large language models designed to perform well in multiple languages and address performance issues with current multilingual models.	476
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
umass-foundation-model/3d-llm	Developing a Large Language Model capable of processing 3D representations as inputs	979
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87