IdealGPT
Vision Reasoning Framework
A deep learning framework for iteratively decomposing vision and language reasoning via large language models.
Official Code of IdealGPT
32 stars
2 watching
8 forks
Language: Python
last commit: about 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
shizhediao/davinci | An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. | 43 |
yuxie11/r2d2 | A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese | 157 |
fyu/dilation | This project provides a deep learning framework implementing dilated convolutions for semantic image segmentation | 781 |
tobypde/frrn | A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks | 280 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
ivaylo-popov/theano-lights | A deep learning framework built on top of Theano, providing a wide range of models and training techniques for research and development. | 267 |
wpiroboticsprojects/grip | A computer vision framework for robotics applications that simplifies the creation of vision systems and generates code in multiple programming languages. | 379 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 528 |
yaodongyu/tct | An approach to train and optimize machine learning models in a decentralized setting by convexifying the optimization process | 4 |
guopengf/auto-fedrl | A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments. | 15 |
sarababakn/mfcl-neurips23 | A framework for mitigating catastrophic forgetting in federated learning for vision tasks using data synthesis from past distributions. | 15 |
pku-yuangroup/chat-univi | A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 847 |
jonfanlab/glonet | A software framework for training neural networks to optimize dielectric metasurfaces using physics-driven generative models and global optimization algorithms. | 101 |
tianyi-lab/hallusionbench | An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 243 |