scenic
Computer Vision Toolkit
A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines.
Scenic: A Jax Library for Computer Vision Research and Beyond
3k stars
40 watching
438 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list
attentioncomputer-visiondeep-learningjaxresearchtransformersvision-transformer
Related projects:
Repository | Description | Stars |
---|---|---|
google-research/vision_transformer | Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax | 10,450 |
roboflow/notebooks | A collection of tutorials and examples on using various computer vision models and techniques. | 5,547 |
google-research/big_vision | Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. | 2,334 |
jshilong/gpt4roi | Training and deploying large language models on computer vision tasks using region-of-interest inputs | 506 |
uber-research/upsnet | Develops an instance segmentation and panoptic segmentation model for computer vision tasks. | 649 |
haotian-liu/llava | A system that uses large language and vision models to generate and process visual instructions | 20,232 |
google-research/cad-estate | A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods. | 105 |
google-research/big_transfer | Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks | 1,513 |
google/jaxopt | An open-source project providing hardware accelerated, batchable and differentiable optimizers in JAX for deep learning. | 933 |
nexusapoorvacus/deepvariationstructuredrl | An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch. | 63 |
huggingface/transformers | A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. | 135,022 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |
rastapasta/react-native-gl-model-view | A React Native component that displays and animates 3D models loaded from Wavefront OBJ files. | 419 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
matthias-wright/flaxmodels | Provides pre-trained deep learning models for the Jax/Flax ecosystem. | 238 |