XVERSE-V-13B

Multimodal model

A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.

78 stars

4 watching

4 forks

Language: Python

last commit: over 1 year ago

Related projects:

Repository	Description	Stars
xverse-ai/xverse-13b	A large language model developed to support multiple languages and applications	648
xverse-ai/xverse-moe-a36b	Develops and publishes large multilingual language models with advanced mixing-of-experts architecture.	37
xverse-ai/xverse-moe-a4.2b	Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding.	36
xverse-ai/xverse-65b	A large language model developed by XVERSE Technology Inc. using transformer architecture and fine-tuned on diverse data sets for various applications.	132
xverse-ai/xverse-7b	A multilingual large language model developed by XVERSE Technology Inc.	50
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
tsb0601/mmvp	An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks.	296
tiger-ai-lab/uniir	Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks.	114
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
nvlabs/eagle	Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions	549
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
vimalabs/vima	An implementation of a general-purpose robot learning model using multimodal prompts	781
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15