LongVA
Language-to-Vision Model
This project provides a model for long context transfer from language to vision using a deep learning framework.
Long Context Transfer from Language to Vision
334 stars
8 watching
17 forks
Language: Python
last commit: 26 days ago Related projects:
Repository | Description | Stars |
---|---|---|
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 508 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 269 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,782 |
evolvinglmms-lab/lmms-eval | Tools and evaluation suite for large multimodal models | 2,058 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 915 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
umass-foundation-model/3d-llm | Developing a Large Language Model capable of processing 3D representations as inputs | 961 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 244 |
freedomintelligence/longllava | A system for scaling large language models to process and understand visual information from multiple images efficiently. | 179 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,298 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,861 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,300 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
vivo-ai-lab/bluelm | Develops and releases large language models trained on vast amounts of data for various applications, including natural language understanding, text generation, and more. | 852 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,077 |