Otter
Multi-modal AI model
A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
4k stars
100 watching
243 forks
Language: Python
last commit: 12 months ago artificial-inteligencechatgptdeep-learningembodied-aifoundation-modelsgpt-4instruction-tuninglarge-scale-modelsmachine-learningmulti-modalityvisual-language-learning
Related projects:
Repository | Description | Stars |
---|---|---|
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| A framework for training large multimodal models to generate text conditioned on images or other text. | 3,781 |
| An open-source framework for training large language models with vision capabilities. | 3,229 |
| A deep learning framework for generating videos from text inputs and visual features. | 3,071 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| A toolkit for easy and high-performance deployment of deep learning models on various hardware platforms | 3,034 |
| Provides a unified framework to test generative language models on various evaluation tasks. | 7,200 |
| An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. | 4,152 |
| A Python-based framework for serving large language models with low latency and high scalability. | 2,691 |
| An open-source toolkit for pretraining and fine-tuning large language models | 2,732 |
| A toolset for deploying deep learning models on various devices and platforms | 2,797 |
| A text-to-image synthesis model with a modular design, utilizing a frozen text encoder and cascaded pixel diffusion modules to generate photorealistic images. | 7,699 |
| A low-code framework for building custom deep learning models and neural networks | 11,236 |
| Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,948 |
| A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |