LongVA

Long context transfer

An open-source project that enables the transfer of language understanding to vision capabilities through long context processing.

Long Context Transfer from Language to Vision

GitHub

347 stars

7 watching

18 forks

Language: Python

last commit: 9 months ago

Related projects:

Repository	Description	Stars
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
vhellendoorn/code-lms	A guide to using pre-trained large language models in source code analysis and generation	1,789
evolvinglmms-lab/lmms-eval	Tools and evaluation framework for accelerating the development of large multimodal models by providing an efficient way to assess their performance	2,164
opengvlab/visionllm	A large language model designed to process and generate visual information	956
byungkwanlee/collavo	Develops a PyTorch implementation of an enhanced vision language model	93
umass-foundation-model/3d-llm	Developing a Large Language Model capable of processing 3D representations as inputs	979
boheumd/ma-lmm	This project develops an AI model for long-term video understanding	254
freedomintelligence/longllava	A system for scaling large language models to process and understand visual information from multiple images efficiently.	183
nvlabs/prismer	A deep learning framework for training multi-modal models with vision and language capabilities.	1,299
dvlab-research/lisa	A system that uses large language models to generate segmentation masks for images based on complex queries and world knowledge.	1,923
lxtgh/omg-seg	Develops an end-to-end model for multiple visual perception and reasoning tasks using a single encoder, decoder, and large language model.	1,336
yiren-jian/blitext	Develops and trains models for vision-language learning with decoupled language pre-training	24
vivo-ai-lab/bluelm	Develops and releases large language models trained on vast amounts of data for various applications, including natural language understanding, text generation, and more.	864
deepseek-ai/deepseek-vl	A multimodal AI model that enables real-world vision-language understanding applications	2,145