big_vision

Vision trainer

Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

GitHub

2k stars

39 watching

162 forks

Language: Jupyter Notebook

last commit: over 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

amrzv/awesome-colab-notebooks

Related projects:

Repository	Description	Stars
google-research/vision_transformer	Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax	10,620
google-research/big_transfer	Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks	1,516
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
facebookresearch/metaseq	A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms.	6,519
donnyyou/torchcv	A comprehensive PyTorch-based framework for computer vision tasks	2,249
huggingface/transformers	A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects.	136,357
dmlc/gluon-cv	A toolkit for building and deploying deep learning models in computer vision	5,850
google-research/scenic	A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines.	3,363
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
eleutherai/gpt-neox	Provides a framework for training large-scale language models on GPUs with advanced features and optimizations.	6,997
google-research/nested-transformer	An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency.	195
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
google-research/text-to-text-transfer-transformer	Provides tools and libraries for training and fine-tuning large language models using transformer architectures	6,215
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
google/big-bench	A benchmark designed to probe large language models and extrapolate their future capabilities through a diverse set of tasks.	2,899