big_vision

Vision trainer

Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

GitHub

2k stars
41 watching
157 forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,502
google-research/big_transfer Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks 1,513
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
facebookresearch/metaseq A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. 6,517
donnyyou/torchcv A comprehensive PyTorch-based framework for computer vision tasks 2,250
huggingface/transformers A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. 135,022
dmlc/gluon-cv A toolkit for building and deploying deep learning models in computer vision 5,833
google-research/scenic A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines. 3,332
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,359
eleutherai/gpt-neox Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. 6,941
google-research/nested-transformer An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. 193
dvlab-research/mgm An open-source framework for training large language models with vision capabilities. 3,211
google-research/text-to-text-transfer-transformer Provides tools and libraries for training and fine-tuning large language models using transformer architectures 6,181
qwenlm/qwen-vl A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks 5,079
google/big-bench A benchmark designed to evaluate the capabilities of large language models by simulating various tasks and measuring their performance 2,868