OFA

Sequence-to-sequence framework

Develops a unified sequence-to-sequence learning framework to unify modalities and tasks through pretraining and fine-tuning

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

GitHub

2k stars
21 watching
248 forks
Language: Python
last commit: 7 months ago
chineseimage-captioningmultimodalpretrained-modelspretrainingpromptprompt-tuningreferring-expression-comprehensiontext-to-image-synthesisvision-languagevisual-question-answering

Related projects:

Repository Description Stars
yangjianxin1/ofa-chinese Transforms the OFA-Chinese model to work with the Hugging Face Transformers framework 123
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
jina-ai/dalle-flow An interactive workflow for generating high-definition images from text prompts using a human-in-the-loop approach 2,834
orpatashnik/styleclip This project provides an implementation of a method to manipulate images by driving the style with text. 4,000
ofa-sys/touchstone A tool to evaluate vision-language models by comparing their performance on various tasks such as image recognition and text generation. 78
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
lucidrains/dalle2-pytorch An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch 11,148
clovaai/deep-text-recognition-benchmark Provides a benchmarking framework and implementation for deep learning-based text recognition models 3,755
openai/clip A neural network trained on image and text pairs to predict the most relevant text snippet given an image 25,919
ucsc-vlaa/sight-beyond-text This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. 19
amazon-science/mm-cot An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference. 3,810
ucas-haoranwei/got-ocr2.0 A Python implementation of an end-to-end OCR model for unified general OCR theory, supporting various formats and fine-grained recognition. 6,011
pathak22/context-encoder Unsupervised feature learning by image inpainting using Generative Adversarial Networks (GANs) 885
systemerrorwang/white-box-cartoonization An implementation of a deep learning-based facial cartoonization system using TensorFlow 3,958
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,580