Otter

Multi-modal AI model

A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets.

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

GitHub

4k stars
100 watching
242 forks
Language: Python
last commit: 9 months ago
artificial-inteligencechatgptdeep-learningembodied-aifoundation-modelsgpt-4instruction-tuninglarge-scale-modelsmachine-learningmulti-modalityvisual-language-learning

Related projects:

Repository Description Stars
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
mlfoundations/open_flamingo A framework for training large multimodal models to generate text conditioned on images or other text. 3,742
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
pku-yuangroup/video-llava This project enables large language models to perform visual reasoning capabilities on images and videos simultaneously by learning united visual representations before projection. 2,990
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
paddlepaddle/fastdeploy A toolkit for easy and high-performance deployment of deep learning models on various hardware platforms 2,998
eleutherai/lm-evaluation-harness Provides a unified framework to test generative language models on various evaluation tasks. 6,970
facico/chinese-vicuna An instruction-following Chinese LLaMA-based model project aimed at training and fine-tuning models on specific hardware configurations for efficient deployment. 4,142
modeltc/lightllm An LLM inference and serving framework providing a lightweight design, scalability, and high-speed performance for large language models. 2,609
alpha-vllm/llama2-accessory An open-source toolkit for pretraining and fine-tuning large language models 2,720
open-mmlab/mmdeploy A toolset for deploying deep learning models on various devices and platforms 2,774
deep-floyd/if A text-to-image synthesis model with a modular design, utilizing a frozen text encoder and cascaded pixel diffusion modules to generate photorealistic images. 7,688
ludwig-ai/ludwig A low-code framework for building custom deep learning models and neural networks 11,189
open-mmlab/mmcv Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. 5,906
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,619