InternVL
Multimodal model suite
A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
6k stars
53 watching
465 forks
Language: Python
last commit: 6 days ago gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b
Related projects:
Repository | Description | Stars |
---|---|---|
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,619 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,653 |
thudm/cogvlm | Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,080 |
internlm/internlm | Large language models for chatbot and natural language understanding applications | 6,473 |
opengvlab/internvideo | Developing video foundation models and datasets for multimodal understanding and applications | 1,413 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,422 |
memochou1993/gpt-ai-assistant | An AI-powered chat application using OpenAI and LINE APIs | 7,428 |
open-mmlab/mmaction2 | A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval. | 4,296 |
openai/gpt-2 | A repository providing code and models for research into language modeling and multitask learning | 22,516 |
open-mmlab/mmcv | Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,906 |
internlm/internlm-xcomposer | A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. | 2,521 |
open-compass/opencompass | An LLM evaluation platform supporting various models and datasets | 4,124 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,754 |
doubiiu/dynamicrafter | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,580 |
thudm/glm-4 | Develops and releases pre-trained models for conversational AI tasks with enhanced capabilities on long text generation, multimodal interaction, and domain adaptation. | 5,277 |