OFA

Sequence-to-sequence framework

Develops a unified sequence-to-sequence learning framework to unify modalities and tasks through pretraining and fine-tuning

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

GitHub

2k stars
21 watching
248 forks
Language: Python
last commit: 10 months ago
chineseimage-captioningmultimodalpretrained-modelspretrainingpromptprompt-tuningreferring-expression-comprehensiontext-to-image-synthesisvision-languagevisual-question-answering

Related projects:

Repository Description Stars
yangjianxin1/ofa-chinese Transforms the OFA-Chinese model to work with the Hugging Face Transformers framework 123
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
jina-ai/dalle-flow An interactive workflow for generating high-definition images from text prompts using a human-in-the-loop approach 2,837
orpatashnik/styleclip This project provides an implementation of a method to manipulate images by driving the style with text. 4,025
ofa-sys/touchstone A tool to evaluate vision-language models by comparing their performance on various tasks such as image recognition and text generation. 79
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
lucidrains/dalle2-pytorch An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch 11,184
clovaai/deep-text-recognition-benchmark Provides a benchmarking framework and implementation for deep learning-based text recognition models 3,769
openai/clip A neural network trained on image and text pairs to predict the most relevant text snippet given an image 26,460
ucsc-vlaa/sight-beyond-text An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models 19
amazon-science/mm-cot An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference. 3,833
ucas-haoranwei/got-ocr2.0 An end-to-end OCR system implementing General OCR Theory towards a unified model 6,334
pathak22/context-encoder Unsupervised feature learning by image inpainting using Generative Adversarial Networks (GANs) 887
systemerrorwang/white-box-cartoonization An implementation of a deep learning-based facial cartoonization system using TensorFlow 3,958
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,668