AnyGPT
Multimodal converter
An open-source multimodal language model that can process and convert different data types such as speech, text, images, and music into a unified format.
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
798 stars
20 watching
64 forks
Language: Python
last commit: about 1 year ago Related projects:
| Repository | Description | Stars |
|---|---|---|
| | A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages | 1,098 |
| | Trains a multimodal chatbot that combines visual and language instructions to generate responses | 1,478 |
| | Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| | Develops a unified model to generate high-quality motions and text descriptions from human motion data | 1,531 |
| | A large multimodal language model designed to process and analyze video, image, text, and audio inputs in real-time. | 1,005 |
| | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
| | An end-to-end image captioning system that uses large multi-modal models and provides tools for training, inference, and demo usage. | 1,849 |
| | An evaluation platform for comparing multi-modality models on visual question-answering tasks | 478 |
| | Develops a multimodal task and dataset to assess vision-language models' ability to handle interleaved image-text inputs. | 33 |
| | A large language model designed to process and generate visual information | 956 |
| | Library that provides a unified API to interact with various Large Language Models (LLMs) | 367 |
| | Develops large multimodal models for high-resolution understanding and analysis of text, images, and other data types. | 143 |
| | This project provides code and model for improving language understanding through generative pre-training using a transformer-based architecture. | 2,167 |
| | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 |
| | A simple Matrix bot that listens to uploaded files and converts Quarto files to PDF. | 2 |