AnyGPT

Multimodal converter

An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

GitHub

798 stars

20 watching

64 forks

Language: Python

last commit: almost 2 years ago

Related projects:

Repository	Description	Stars
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
open-mmlab/multimodal-gpt	Trains a multimodal chatbot that combines visual and language instructions to generate responses	1,478
multimodal-art-projection/omnibench	Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously.	15
openmotionlab/motiongpt	Develops a unified model to generate high-quality motions and text descriptions from human motion data	1,531
vita-mllm/vita	A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time.	1,005
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
yuliang-liu/monkey	An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage.	1,849
opengvlab/multi-modality-arena	An evaluation platform for comparing multi-modality models on visual question-answering tasks	478
zhourax/vega	Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs.	33
opengvlab/visionllm	A large language model designed to process and generate visual information	956
r2d4/openlm	Library that provides a unified API to interact with various Large Language Models (LLMs)	367
yfzhang114/slime	Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types.	143
openai/finetune-transformer-lm	This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture.	2,167
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92
rgomez90/matrix-bot	A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF.	2