AnyGPT
Multimodal converter
An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
798 stars
20 watching
64 forks
Language: Python
last commit: 5 months ago Related projects:
Repository | Description | Stars |
---|---|---|
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
openmotionlab/motiongpt | Develops a unified model to generate high-quality motions and text descriptions from human motion data | 1,531 |
vita-mllm/vita | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
yuliang-liu/monkey | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 956 |
r2d4/openlm | Library that provides a unified API to interact with various Large Language Models (LLMs) | 367 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
neulab/pangea | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 |
rgomez90/matrix-bot | A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF. | 2 |