InternVL

Multimodal model builder

Develops large language models capable of processing multiple data types and modalities

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

GitHub

6k stars

57 watching

493 forks

Language: Python

last commit: about 1 year ago

gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b

Screenshot of OpenGVLab/InternVL website

internvl.readthedocs.io/en/latest/

Related projects:

Repository	Description	Stars
openbmb/minicpm-v	A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.	12,870
internlm/lmdeploy	A toolkit for optimizing and serving large language models	4,854
thudm/cogvlm	Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.	6,182
internlm/internlm	A collection of large language models designed to improve reasoning and tool use capabilities in chatbots.	6,572
opengvlab/internvideo	Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning.	1,467
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
memochou1993/gpt-ai-assistant	An AI-powered chat application leveraging OpenAI models and LINE APIs for conversational interfaces.	7,491
open-mmlab/mmaction2	A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval.	4,360
openai/gpt-2	A repository providing code and models for research into language modeling and multitask learning	22,644
open-mmlab/mmcv	Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops.	5,948
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
open-compass/opencompass	An LLM evaluation platform supporting various models and datasets	4,295
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
doubiiu/dynamicrafter	This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors.	2,668
thudm/glm-4	A large language model designed for multilingual and multimodal chat applications with advanced features such as long-text reasoning and high-performance inference.	5,525