OFA

Sequence-to-sequence framework

Develops a unified sequence-to-sequence learning framework to unify modalities and tasks through pretraining and fine-tuning

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

GitHub

2k stars

21 watching

248 forks

Language: Python

last commit: about 1 year ago

chineseimage-captioningmultimodalpretrained-modelspretrainingpromptprompt-tuningreferring-expression-comprehensiontext-to-image-synthesisvision-languagevisual-question-answering

Related projects:

Repository	Description	Stars
yangjianxin1/ofa-chinese	Transforms the OFA-Chinese model to work with the Hugging Face Transformers framework	123
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
jina-ai/dalle-flow	An interactive workflow for generating high-definition images from text prompts using a human-in-the-loop approach	2,837
orpatashnik/styleclip	This project provides an implementation of a method to manipulate images by driving the style with text.	4,025
ofa-sys/touchstone	A tool to evaluate vision-language models by comparing their performance on various tasks such as image recognition and text generation.	79
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
lucidrains/dalle2-pytorch	An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch	11,184
clovaai/deep-text-recognition-benchmark	Provides a benchmarking framework and implementation for deep learning-based text recognition models	3,769
openai/clip	A neural network trained on image and text pairs to predict the most relevant text snippet given an image	26,460
ucsc-vlaa/sight-beyond-text	An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models	19
amazon-science/mm-cot	An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference.	3,833
ucas-haoranwei/got-ocr2.0	An end-to-end OCR system implementing General OCR Theory towards a unified model	6,334
pathak22/context-encoder	Unsupervised feature learning by image inpainting using Generative Adversarial Networks (GANs)	887
systemerrorwang/white-box-cartoonization	An implementation of a deep learning-based facial cartoonization system using TensorFlow	3,958
doubiiu/dynamicrafter	This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors.	2,668