MiniCPM-V

Multimodal LLM

A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs.

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

GitHub

13k stars

108 watching

902 forks

Language: Python

last commit: about 1 year ago

Linked from 2 awesome lists

minicpmminicpm-vmulti-modal

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
opengvlab/internvl	Develops large language models capable of processing multiple data types and modalities	6,394
open-mmlab/mmcv	Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops.	5,948
openbmb/viscpm	A family of large multimodal models supporting multimodal conversational capabilities and text-to-image generation in multiple languages	1,098
vision-cair/minigpt-4	Enabling vision-language understanding by fine-tuning large language models on visual data.	25,490
dvlab-research/mgm	An open-source framework for training large language models with vision capabilities.	3,229
qwenlm/qwen-vl	A large vision language model with improved image reasoning and text recognition capabilities, suitable for various multimodal tasks	5,179
opengvlab/llama-adapter	An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy	5,775
open-mmlab/mmaction2	A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval.	4,360
openmv/openmv	A platform for machine vision development with programmable cameras and extensive image processing capabilities	2,446
internlm/internlm-xcomposer	A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition	2,616
openbmb/toolbench	A platform for training, serving, and evaluating large language models to enable tool use capability	4,888
thudm/cogvlm	Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems.	6,182
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
luodian/otter	A multi-modal AI model developed for improved instruction-following and in-context learning, utilizing large-scale architectures and various training datasets.	3,570
cambrian-mllm/cambrian	An open-source multimodal LLM project with a vision-centric design	1,799