AnyGPT

Multimodal converter

An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

GitHub

779 stars
21 watching
61 forks
Language: Python
last commit: 3 months ago

Related projects:

Repository Description Stars
openbmb/viscpm A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages 1,089
open-mmlab/multimodal-gpt Trains a multimodal chatbot that combines visual and language instructions to generate responses 1,477
multimodal-art-projection/omnibench Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. 14
openmotionlab/motiongpt Develops a unified model to generate high-quality motions and text descriptions from human motion data 1,505
vita-mllm/vita A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. 961
mbzuai-oryx/groundinglmm An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks. 781
yuliang-liu/monkey A toolkit for building conversational AI models that can process images and text inputs. 1,825
opengvlab/multi-modality-arena An evaluation platform for comparing multi-modality models on visual question-answering tasks 467
zhourax/vega Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. 33
opengvlab/visionllm A large language model designed to process and generate visual information 915
r2d4/openlm Library that provides a unified API to interact with various Large Language Models (LLMs) 366
yfzhang114/slime Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. 137
openai/finetune-transformer-lm This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. 2,160
neulab/pangea An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts 91
rgomez90/matrix-bot A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF. 2