XVERSE-V-13B
Multimodal model
A large multimodal model for visual question answering, trained on a dataset of 2.1B image-text pairs and 8.2M instruction sequences.
78 stars
4 watching
4 forks
Language: Python
last commit: 10 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A large language model developed to support multiple languages and applications | 648 |
| Develops and publishes large multilingual language models with advanced mixing-of-experts architecture. | 37 |
| Developed by XVERSE Technology Inc. as a multilingual large language model with a unique mixture-of-experts architecture and fine-tuned for various tasks such as conversation, question answering, and natural language understanding. | 36 |
| A large language model developed by XVERSE Technology Inc. using transformer architecture and fine-tuned on diverse data sets for various applications. | 132 |
| A multilingual large language model developed by XVERSE Technology Inc. | 50 |
| Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
| An evaluation framework for multimodal language models' visual capabilities using image and question benchmarks. | 296 |
| Trains and evaluates a universal multimodal retrieval model to perform various information retrieval tasks. | 114 |
| An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
| Develops high-resolution multimodal LLMs by combining vision encoders and various input resolutions | 549 |
| A multimodal LLM designed to handle text-rich visual questions | 270 |
| An implementation of a general-purpose robot learning model using multimodal prompts | 781 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |