InternVL
Multimodal model builder
Develops large language models capable of processing multiple data types and modalities
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
6k stars
57 watching
493 forks
Language: Python
last commit: about 1 month ago gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b
Related projects:
Repository | Description | Stars |
---|---|---|
openbmb/minicpm-v | A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
internlm/lmdeploy | A toolkit for optimizing and serving large language models | 4,854 |
thudm/cogvlm | Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,182 |
internlm/internlm | A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. | 6,572 |
opengvlab/internvideo | Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. | 1,467 |
vision-cair/minigpt-4 | Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
memochou1993/gpt-ai-assistant | An AI-powered chat application leveraging OpenAI models and LINE APIs for conversational interfaces. | 7,491 |
open-mmlab/mmaction2 | A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval. | 4,360 |
openai/gpt-2 | A repository providing code and models for research into language modeling and multitask learning | 22,644 |
open-mmlab/mmcv | Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,948 |
internlm/internlm-xcomposer | A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition | 2,616 |
open-compass/opencompass | An LLM evaluation platform supporting various models and datasets | 4,295 |
opengvlab/llama-adapter | An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
doubiiu/dynamicrafter | This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,668 |
thudm/glm-4 | A large language model designed for multilingual and multimodal chat applications with advanced features such as long-text reasoning and high-performance inference. | 5,525 |