big_vision
Vision trainer
Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
2k stars
39 watching
162 forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,620 |
| Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks | 1,516 |
| Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
| A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. | 6,519 |
| A comprehensive PyTorch-based framework for computer vision tasks | 2,249 |
| A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. | 136,357 |
| A toolkit for building and deploying deep learning models in computer vision | 5,850 |
| A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines. | 3,363 |
| A system that uses large language and vision models to generate and process visual instructions | 20,683 |
| Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,997 |
| An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 195 |
| An open-source framework for training large language models with vision capabilities. | 3,229 |
| Provides tools and libraries for training and fine-tuning large language models using transformer architectures | 6,215 |
| A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,179 |
| A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. | 2,899 |