 AnyGPT
 AnyGPT 
 Multimodal converter
 An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
798 stars
 20 watching
 64 forks
 
Language: Python 
last commit: about 1 year ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 | 
|  | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 | 
|  | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 | 
|  | Develops a unified model to generate high-quality motions and text descriptions from human motion data | 1,531 | 
|  | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 | 
|  | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 | 
|  | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 | 
|  | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 | 
|  | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 | 
|  | A large language model designed to process and generate visual information | 956 | 
|  | Library that provides a unified API to interact with various Large Language Models (LLMs) | 367 | 
|  | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 | 
|  | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 | 
|  | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 | 
|  | A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF. | 2 |