InternVL

Multimodal model builder

Develops large language models capable of processing multiple data types and modalities

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

GitHub

6k stars
57 watching
493 forks
Language: Python
last commit: about 1 month ago
gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b

Related projects:

Repository Description Stars
openbmb/minicpm-v A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. 12,870
internlm/lmdeploy A toolkit for optimizing and serving large language models 4,854
thudm/cogvlm Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. 6,182
internlm/internlm A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. 6,572
opengvlab/internvideo Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. 1,467
vision-cair/minigpt-4 Enabling vision-language understanding by fine-tuning large language models on visual data. 25,490
memochou1993/gpt-ai-assistant An AI-powered chat application leveraging OpenAI models and LINE APIs for conversational interfaces. 7,491
open-mmlab/mmaction2 A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval. 4,360
openai/gpt-2 A repository providing code and models for research into language modeling and multitask learning 22,644
open-mmlab/mmcv Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. 5,948
internlm/internlm-xcomposer A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition 2,616
open-compass/opencompass An LLM evaluation platform supporting various models and datasets 4,295
opengvlab/llama-adapter An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy 5,775
doubiiu/dynamicrafter This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. 2,668
thudm/glm-4 A large language model designed for multilingual and multimodal chat applications with advanced features such as long-text reasoning and high-performance inference. 5,525