LongVA

Language-to-Vision Model

This project provides a model for long context transfer from language to vision using a deep learning framework.

Long Context Transfer from Language to Vision

GitHub

334 stars
8 watching
17 forks
Language: Python
last commit: 26 days ago

Related projects:

Repository Description Stars
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 508
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 269
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,782
evolvinglmms-lab/lmms-eval Tools and evaluation suite for large multimodal models 2,058
opengvlab/visionllm A large language model designed to process and generate visual information 915
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
umass-foundation-model/3d-llm Developing a Large Language Model capable of processing 3D representations as inputs 961
boheumd/ma-lmm This project develops an AI model for long-term video understanding 244
freedomintelligence/longllava A system for scaling large language models to process and understand visual information from multiple images efficiently. 179
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,298
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,861
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,300
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
vivo-ai-lab/bluelm Develops and releases large language models trained on vast amounts of data for various applications, including natural language understanding, text generation, and more. 852
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,077