scenic

Computer Vision Toolkit

A collection of libraries and projects focused on research around attention-based models for computer vision and beyond, providing optimized tools and baselines.

Scenic: A Jax Library for Computer Vision Research and Beyond

GitHub

3k stars

39 watching

444 forks

Language: Python

last commit: 8 months ago

Linked from 1 awesome list

attentioncomputer-visiondeep-learningjaxresearchtransformersvision-transformer

Backlinks from these awesome lists:

n2cholas/awesome-jax

Related projects:

Repository	Description	Stars
google-research/vision_transformer	Provides pre-trained models and code for training vision transformers and mixers using JAX/Flax	10,620
roboflow/notebooks	This repository contains tutorials and examples on using state-of-the-art computer vision models and techniques	5,678
google-research/big_vision	Supports large-scale vision model training on GPU machines or Google Cloud TPUs using scalable input pipelines.	2,439
jshilong/gpt4roi	Training and deploying large language models on computer vision tasks using region-of-interest inputs	517
uber-research/upsnet	Develops an instance segmentation and panoptic segmentation model for computer vision tasks.	648
haotian-liu/llava	A system that uses large language and vision models to generate and process visual instructions	20,683
google-research/cad-estate	A large dataset of 3D object and room layout annotations on RGB videos, designed to test automatic scene understanding methods.	106
google-research/big_transfer	Pre-trained models and code for fine-tuning image recognition tasks using deep learning frameworks	1,516
google/jaxopt	An open-source project providing hardware accelerated, batchable and differentiable optimizers in JAX for deep learning.	941
nexusapoorvacus/deepvariationstructuredrl	An implementation of reinforcement learning for visual relationship and attribute detection using PyTorch.	63
huggingface/transformers	A collection of pre-trained machine learning models for various natural language and computer vision tasks, enabling developers to fine-tune and deploy these models on their own projects.	136,357
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145
rastapasta/react-native-gl-model-view	A React Native component that displays and animates 3D models loaded from Wavefront OBJ files.	419
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
matthias-wright/flaxmodels	Provides pre-trained deep learning models for the Jax/Flax ecosystem.	240