big_vision
Vision trainer
Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
2k stars
41 watching
157 forks
Language: Jupyter Notebook
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
google-research/vision_transformer | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,502 |
google-research/big_transfer | Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks | 1,513 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
facebookresearch/metaseq | A codebase for working with Open Pre-trained Transformers, enabling deployment and fine-tuning of transformer models on various platforms. | 6,517 |
donnyyou/torchcv | A comprehensive PyTorch-based framework for computer vision tasks | 2,250 |
huggingface/transformers | A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. | 135,022 |
dmlc/gluon-cv | A toolkit for building and deploying deep learning models in computer vision | 5,833 |
google-research/scenic | A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines. | 3,332 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,359 |
eleutherai/gpt-neox | Provides a framework for training large-scale language models on GPUs with advanced features and optimizations. | 6,941 |
google-research/nested-transformer | An implementation of a transformer-based vision model that aggregates local transformers on image blocks to improve accuracy and efficiency. | 193 |
dvlab-research/mgm | An open-source framework for training large language models with vision capabilities. | 3,211 |
google-research/text-to-text-transfer-transformer | Provides tools and libraries for training and fine-tuning large language models using transformer architectures | 6,181 |
qwenlm/qwen-vl | A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks | 5,079 |
google/big-bench | A benchmark designed to evaluate the capabilities of large language models by simulating various tasks and measuring their performance | 2,868 |