InternVL

Multimodal model suite

A pioneering open-source alternative to commercial multimodal models with a family of large-scale language and vision models.

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

GitHub

6k stars
53 watching
465 forks
Language: Python
last commit: 6 days ago
gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b

Related projects:

Repository Description Stars
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,619
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,653
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,080
internlm/internlm Large language models for chatbot and natural language understanding applications 6,473
opengvlab/internvideo Developing video foundation models and datasets for multimodal understanding and applications 1,413
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,422
memochou1993/gpt-ai-assistant An AI-powered chat application using OpenAI and LINE APIs 7,428
open-mmlab/mmaction2 A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval. 4,296
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,516
open-mmlab/mmcv Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. 5,906
internlm/internlm-xcomposer A large vision language model that can understand and generate text from visual inputs, with capabilities for long-contextual input and output, high-resolution understanding, fine-grained video understanding, and multi-turn multi-image dialogue. 2,521
open-compass/opencompass An LLM evaluation platform supporting various models and datasets 4,124
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,754
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,580
thudm/glm-4 Develops and releases pre-trained models for conversational AI tasks with enhanced capabilities on long text generation, multimodal interaction, and domain adaptation. 5,277