IdealGPT
Vision Reasoning Framework
A deep learning framework for iteratively decomposing vision and language reasoning via large language models.
Official Code of IdealGPT
32 stars
2 watching
8 forks
Language: Python
last commit: over 1 year ago Related projects:
Repository | Description | Stars |
---|---|---|
shizhediao/davinci | Implementing a unified modal learning framework for generative vision-language models | 43 |
yuxie11/r2d2 | A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese | 157 |
fyu/dilation | This project provides a deep learning framework implementing dilated convolutions for semantic image segmentation | 782 |
tobypde/frrn | A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks | 280 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 517 |
ivaylo-popov/theano-lights | A deep learning framework built on top of Theano, providing a wide range of models and training techniques for research and development. | 267 |
wpiroboticsprojects/grip | A computer vision framework for robotics applications that simplifies the creation of vision systems and generates code in multiple programming languages. | 380 |
jy0205/lavit | A unified framework for training large language models to understand and generate visual content | 544 |
yaodongyu/tct | An approach to train and optimize machine learning models in a decentralized setting by convexifying the optimization process | 4 |
guopengf/auto-fedrl | A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments. | 15 |
sarababakn/mfcl-neurips23 | An approach to mitigating catastrophic forgetting in federated class incremental learning for vision tasks using a generative model and data-free methods | 15 |
pku-yuangroup/chat-univi | A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. | 895 |
jonfanlab/glonet | A software framework for training neural networks to optimize dielectric metasurfaces using physics-driven generative models and global optimization algorithms. | 101 |
tianyi-lab/hallusionbench | An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy | 259 |