OFA
Sequence-to-sequence framework
Develops a unified sequence-to-sequence learning framework to unify modalities and tasks through pretraining and fine-tuning
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
2k stars
21 watching
248 forks
Language: Python
last commit: 7 months ago chineseimage-captioningmultimodalpretrained-modelspretrainingpromptprompt-tuningreferring-expression-comprehensiontext-to-image-synthesisvision-languagevisual-question-answering
Related projects:
Repository | Description | Stars |
---|---|---|
yangjianxin1/ofa-chinese | Transforms the OFA-Chinese model to work with the Hugging Face Transformers framework | 123 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
jina-ai/dalle-flow | An interactive workflow for generating high-definition images from text prompts using a human-in-the-loop approach | 2,834 |
orpatashnik/styleclip | This project provides an implementation of a method to manipulate images by driving the style with text. | 4,000 |
ofa-sys/touchstone | A tool to evaluate vision-language models by comparing their performance on various tasks such as image recognition and text generation. | 78 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
lucidrains/dalle2-pytorch | An implementation of DALL-E 2's text-to-image synthesis neural network in PyTorch | 11,148 |
clovaai/deep-text-recognition-benchmark | Provides a benchmarking framework and implementation for deep learning-based text recognition models | 3,755 |
openai/clip | A neural network trained on image and text pairs to predict the most relevant text snippet given an image | 25,919 |
ucsc-vlaa/sight-beyond-text | This repository provides an official implementation of a research paper exploring the use of multi-modal training to enhance language models' truthfulness and ethics in various applications. | 19 |
amazon-science/mm-cot | An implementation of multimodal chain-of-thought reasoning in language models using a decoupled training framework for rationale generation and answer inference. | 3,810 |
ucas-haoranwei/got-ocr2.0 | A Python implementation of an end-to-end OCR model for unified general OCR theory, supporting various formats and fine-grained recognition. | 6,011 |
pathak22/context-encoder | Unsupervised feature learning by image inpainting using Generative Adversarial Networks (GANs) | 885 |
systemerrorwang/white-box-cartoonization | An implementation of a deep learning-based facial cartoonization system using TensorFlow | 3,958 |
doubiiu/dynamicrafter | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,580 |