LongVA

Long context transfer

An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.

Long Context Transfer from Language to Vision

GitHub

347 stars
7 watching
18 forks
Language: Python
last commit: about 2 months ago

Related projects:

Repository Description Stars
luogen1996/lavin An open-source implementation of a vision-language instructed large language model 513
vpgtrans/vpgtrans Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs 270
vhellendoorn/code-lms A guide to using pre-trained large language models in source code analysis and generation 1,789
evolvinglmms-lab/lmms-eval Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance 2,164
opengvlab/visionllm A large language model designed to process and generate visual information 956
byungkwanlee/collavo Develops a PyTorch implementation of an enhanced vision language model 93
umass-foundation-model/3d-llm Developing a Large Language Model capable of processing 3D representations as inputs 979
boheumd/ma-lmm This project develops an AI model for long-term video understanding 254
freedomintelligence/longllava A system for scaling large language models to process and understand visual information from multiple images efficiently. 183
nvlabs/prismer A deep learning framework for training multi-modal models with vision and language capabilities. 1,299
dvlab-research/lisa A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge. 1,923
lxtgh/omg-seg Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model. 1,336
yiren-jian/blitext Develops and trains models for vision-language learning with decoupled language pre-training 24
vivo-ai-lab/bluelm Develops and releases large language models trained on vast amounts of data for various applications, including natural language understanding, text generation, and more. 864
deepseek-ai/deepseek-vl A multimodal AI model that enables real-world vision-language understanding applications 2,145