Caption-Anything
Captioner
A tool generating descriptive captions from images with customizable controls and text styles.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
2k stars
16 watching
102 forks
Language: Python
last commit: about 1 year ago chatgptcontrollable-generationcontrollable-image-captioningimage-captioningsegment-anything
Related projects:
Repository | Description | Stars |
---|---|---|
fengyang0317/unsupervised_captioning | An unsupervised image captioning framework that allows generating captions from images without paired data. | 215 |
apple2373/chainer-caption | An image caption generation system using a neural network architecture with pre-trained models. | 64 |
tpkahlon/captcha-image | A library to generate images with distorted text and background patterns for security purposes. | 8 |
eladhoffer/captiongen | A PyTorch-based tool for generating captions from images | 128 |
lumingyin/quickcaption | Automated captioning and transcription tool for video and audio files | 74 |
kacky24/stylenet | A PyTorch implementation of a framework for generating captions with styles for images and videos. | 63 |
lukemelas/image-paragraph-captioning | Trains image paragraph captioning models to generate diverse and accurate captions | 90 |
xiadingz/video-caption.pytorch | PyTorch implementation of video captioning, combining deep learning and computer vision techniques. | 401 |
rmokady/clip_prefix_caption | An approach to image captioning that leverages the CLIP model and fine-tunes a language model without requiring additional supervision or object annotation. | 1,315 |
cshizhe/asg2cap | An image caption generation model that uses abstract scene graphs to fine-grained control and generate captions | 200 |
vision-cair/chatcaptioner | Enables automatic generation of descriptive text from images and videos based on user input. | 452 |
yiwuzhong/sub-gc | A PyTorch implementation of image captioning models via scene graph decomposition. | 96 |
contextualai/lens | Enhances language models to generate text based on visual descriptions of images | 351 |
jaywongwang/densevideocaptioning | An implementation of a dense video captioning model with attention-based fusion and context gating | 148 |
nickjiang2378/vl-interp | This project provides an official PyTorch implementation of a method to interpret and edit vision-language representations to mitigate hallucinations in image captions. | 31 |