AnyGPT
Multimodal converter
An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
779 stars
21 watching
61 forks
Language: Python
last commit: 3 months ago Related projects:
Repository | Description | Stars |
---|---|---|
openbmb/viscpm | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,089 |
open-mmlab/multimodal-gpt | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,477 |
multimodal-art-projection/omnibench | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 14 |
openmotionlab/motiongpt | Develops a unified model to generate high-quality motions and text descriptions from human motion data | 1,505 |
vita-mllm/vita | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 961 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. | 781 |
yuliang-liu/monkey | A toolkit for building conversational AI models that can process images and text inputs. | 1,825 |
opengvlab/multi-modality-arena | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 467 |
zhourax/vega | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
r2d4/openlm | Library that provides a unified API to interact with various Large Language Models (LLMs) | 366 |
yfzhang114/slime | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 137 |
openai/finetune-transformer-lm | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,160 |
neulab/pangea | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 91 |
rgomez90/matrix-bot | A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF. | 2 |