LongVA
Long context transfer
An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.
Long Context Transfer from Language to Vision
347 stars
7 watching
18 forks
Language: Python
last commit: about 2 months ago Related projects:
Repository | Description | Stars |
---|---|---|
luogen1996/lavin | An open-source implementation of a vision-language instructed large language model | 513 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
vhellendoorn/code-lms | A guide to using pre-trained large language models in source code analysis and generation | 1,789 |
evolvinglmms-lab/lmms-eval | Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance | 2,164 |
opengvlab/visionllm | A large language model designed to process and generate visual information | 956 |
byungkwanlee/collavo | Develops a PyTorch implementation of an enhanced vision language model | 93 |
umass-foundation-model/3d-llm | Developing a Large Language Model capable of processing 3D representations as inputs | 979 |
boheumd/ma-lmm | This project develops an AI model for long-term video understanding | 254 |
freedomintelligence/longllava | A system for scaling large language models to process and understand visual information from multiple images efficiently. | 183 |
nvlabs/prismer | A deep learning framework for training multi-modal models with vision and language capabilities. | 1,299 |
dvlab-research/lisa | A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. | 1,923 |
lxtgh/omg-seg | Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. | 1,336 |
yiren-jian/blitext | Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
vivo-ai-lab/bluelm | Develops and releases large language models trained on vast amounts of data for various applications, including natural language understanding, text generation, and more. | 864 |
deepseek-ai/deepseek-vl | A multimodal AI model that enables real-world vision-language understanding applications | 2,145 |