big_vision

Vision trainer

Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

GitHub

2k stars
39 watching
162 forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,620
google-research/big_transfer Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks 1,516
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,519
donnyyou/torchcv A comprehensive PyTorch-based framework for computer vision tasks 2,249
huggingface/transformers A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. 136,357
dmlc/gluon-cv A toolkit for building and deploying deep learning models in computer vision 5,850
google-research/scenic A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines. 3,363
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,683
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,997
google-research/nested-transformer An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. 195
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,229
google-research/text-to-text-transfer-transformer Provides tools and libraries for training and fine-tuning large language models using transformer architectures 6,215
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,179
google/big-bench A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks. 2,899