IdealGPT

Vision Reasoning Framework

A deep learning framework for iteratively decomposing vision and language reasoning via large language models.

Official Code of IdealGPT

GitHub

32 stars
2 watching
8 forks
Language: Python
last commit: about 1 year ago

Related projects:

Repository Description Stars
shizhediao/davinci An implementation of vision-language models for multimodal learning tasks, enabling generative vision-language models to be fine-tuned for various applications. 43
yuxie11/r2d2 A framework for large-scale cross-modal benchmarks and vision-language tasks in Chinese 157
fyu/dilation This project provides a deep learning framework implementing dilated convolutions for semantic image segmentation 781
tobypde/frrn A software framework for training and evaluating full-resolution residual networks for semantic image segmentation tasks 280
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
ivaylo-popov/theano-lights A deep learning framework built on top of Theano, providing a wide range of models and training techniques for research and development. 267
wpiroboticsprojects/grip A computer vision framework for robotics applications that simplifies the creation of vision systems and generates code in multiple programming languages. 379
jy0205/lavit A unified framework for training large language models to understand and generate visual content 528
yaodongyu/tct An approach to train and optimize machine learning models in a decentralized setting by convexifying the optimization process 4
guopengf/auto-fedrl A reinforcement learning-based framework for optimizing hyperparameters in distributed machine learning environments. 15
sarababakn/mfcl-neurips23 A framework for mitigating catastrophic forgetting in federated learning for vision tasks using data synthesis from past distributions. 15
pku-yuangroup/chat-univi A framework for unified visual representation in image and video understanding models, enabling efficient training of large language models on multimodal data. 847
jonfanlab/glonet A software framework for training neural networks to optimize dielectric metasurfaces using physics-driven generative models and global optimization algorithms. 101
tianyi-lab/hallusionbench An image-context reasoning benchmark designed to challenge large vision-language models and help improve their accuracy 243