scenic

Computer Vision Toolkit

A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines.

Scenic: A Jax Library for Computer Vision Research and Beyond

GitHub

3k stars
40 watching
438 forks
Language: Python
last commit: about 1 month ago
Linked from 1 awesome list

attentioncomputer-visiondeep-learningjaxresearchtransformersvision-transformer

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
google-research/vision_transformer Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax 10,450
roboflow/notebooks A collection of tutorials and examples on using various computer vision models and techniques. 5,547
google-research/big_vision Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines. 2,334
jshilong/gpt4roi Training and deploying large language models on computer vision tasks using region-of-interest inputs 506
uber-research/upsnet Develops an instance segmentation and panoptic segmentation model for computer vision tasks. 649
haotian-liu/llava A system that uses large language and vision models to generate and process visual instructions 20,232
google-research/cad-estate A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods. 105
google-research/big_transfer Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks 1,513
google/jaxopt An open-source project providing hardware accelerated, batchable and differentiable optimizers in JAX for deep learning. 933
nexusapoorvacus/deepvariationstructuredrl An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch. 63
huggingface/transformers A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects. 135,022
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077
rastapasta/react-native-gl-model-view A React Native component that displays and animates 3D models loaded from Wavefront OBJ files. 419
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
matthias-wright/flaxmodels Provides pre-trained deep learning models for the Jax/Flax ecosystem. 238