 unsupervised_captioning
 unsupervised_captioning 
 Image captioner
 An unsupervised image captioning framework that allows generating captions from images without paired data.
Code for Unsupervised Image Captioning
215 stars
 7 watching
 51 forks
 
Language: Python 
last commit: over 2 years ago  Related projects:
| Repository | Description | Stars | 
|---|---|---|
|  | An image caption generation system using a neural network architecture with pre-trained models. | 64 | 
|  | This implementation allows users to generate captions from images using a neural network model with visual attention. | 790 | 
|  | A deep learning model for generating image captions with semantic attention | 51 | 
|  | A project for pre-training models to support image captioning and question answering tasks. | 416 | 
|  | An image caption generation model that uses abstract scene graphs to fine-grained control and generate captions | 200 | 
|  | A PyTorch implementation of image captioning models via scene graph decomposition. | 96 | 
|  | An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. | 1,326 | 
|  | Trains image paragraph captioning models to generate diverse and accurate captions | 90 | 
|  | Automates the process of generating multiple rewritten image captions by fine-tuning large vision-language models | 8 | 
|  | Enhances language models to generate text based on visual descriptions of images | 352 | 
|  | An image caption generation system utilizing machine learning models and deep neural networks. | 84 | 
|  | A tool generating descriptive captions from images with customizable controls and text styles. | 1,693 | 
|  | A PyTorch-based tool for generating captions from images | 128 | 
|  | An unsupervised object detection and segmentation framework that can learn from image data alone, without requiring human annotations. | 954 | 
|  | A method for generating and evaluating video captions using adversarial inference, trained on large datasets of text and multimedia features. | 34 |