Caption-Anything

Captioner

A tool generating descriptive captions from images with customizable controls and text styles.

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

GitHub

2k stars
16 watching
102 forks
Language: Python
last commit: about 1 year ago
chatgptcontrollable-generationcontrollable-image-captioningimage-captioningsegment-anything

Related projects:

Repository Description Stars
fengyang0317/unsupervised_captioning An unsupervised image captioning framework that allows generating captions from images without paired data. 215
apple2373/chainer-caption An image caption generation system using a neural network architecture with pre-trained models. 64
tpkahlon/captcha-image A library to generate images with distorted text and background patterns for security purposes. 8
eladhoffer/captiongen A PyTorch-based tool for generating captions from images 128
lumingyin/quickcaption Automated captioning and transcription tool for video and audio files 74
kacky24/stylenet A PyTorch implementation of a framework for generating captions with styles for images and videos. 63
lukemelas/image-paragraph-captioning Trains image paragraph captioning models to generate diverse and accurate captions 90
xiadingz/video-caption.pytorch PyTorch implementation of video captioning, combining deep learning and computer vision techniques. 401
rmokady/clip_prefix_caption An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. 1,315
cshizhe/asg2cap An image caption generation model that uses abstract scene graphs to fine-grained control and generate captions 200
vision-cair/chatcaptioner Enables automatic generation of descriptive text from images and videos based on user input. 452
yiwuzhong/sub-gc A PyTorch implementation of image captioning models via scene graph decomposition. 96
contextualai/lens Enhances language models to generate text based on visual descriptions of images 351
jaywongwang/densevideocaptioning An implementation of a dense video captioning model with attention-based fusion and context gating 148
nickjiang2378/vl-interp This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. 31