InternVL
Multimodal model builder
Develops large language models capable of processing multiple data types and modalities
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
6k stars
57 watching
493 forks
Language: Python
last commit: 2 months ago gptgpt-4ogpt-4vimage-classificationimage-text-retrievalllmmulti-modalsemantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6b
Related projects:
Repository | Description | Stars |
---|---|---|
| A multimodal language model designed to understand images, videos, and text inputs and generate high-quality text outputs. | 12,870 |
| A toolkit for optimizing and serving large language models | 4,854 |
| Develops a state-of-the-art visual language model with applications in image understanding and dialogue systems. | 6,182 |
| A collection of large language models designed to improve reasoning and tool use capabilities in chatbots. | 6,572 |
| Develops general video foundation models and related datasets for multimodal understanding and generation through generative and discriminative learning. | 1,467 |
| Enabling vision-language understanding by fine-tuning large language models on visual data. | 25,490 |
| An AI-powered chat application leveraging OpenAI models and LINE APIs for conversational interfaces. | 7,491 |
| A comprehensive video understanding toolbox and benchmark with modular design, supporting various tasks such as action recognition, localization, and retrieval. | 4,360 |
| A repository providing code and models for research into language modeling and multitask learning | 22,644 |
| Provides a foundational library for computer vision research and training deep learning models with high-quality implementation of common CPU and CUDA ops. | 5,948 |
| A comprehensive multimodal system for long-term streaming video and audio interactions with capabilities including text-image comprehension and composition | 2,616 |
| An LLM evaluation platform supporting various models and datasets | 4,295 |
| An implementation of a method for fine-tuning language models to follow instructions with high efficiency and accuracy | 5,775 |
| This project generates animated videos from open-domain images by leveraging pre-trained video diffusion priors. | 2,668 |
| A large language model designed for multilingual and multimodal chat applications with advanced features such as long-text reasoning and high-performance inference. | 5,525 |