VL-ICL
Learning benchmark
A benchmarking suite for multimodal in-context learning models
Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
31 stars
1 watching
2 forks
Language: Python
last commit: 11 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| Evaluating and improving large multimodal models through in-context learning | 21 |
| Provides a benchmarking framework and dataset for evaluating the performance of large language models in text-to-image tasks | 30 |
| Improves safety and helpfulness of large language models by fine-tuning them using safety-critical tasks | 47 |
| A benchmark for evaluating large language models' ability to process multimodal input | 322 |
| A collection of benchmarks and implementations for testing reinforcement learning-based Volt-VAR control algorithms | 20 |
| Develops a multimodal vision-language model to enable machines to understand complex relationships between instructions and images in various tasks. | 337 |
| An evaluation benchmark for OCR capabilities in large multmodal models. | 484 |
| Evaluates and benchmarks multimodal language models' ability to process visual, acoustic, and textual inputs simultaneously. | 15 |
| A large-scale dataset for natural language processing tasks focused on Chinese scientific literature, providing tools and benchmarks for NLP research. | 582 |
| A platform for comparing and evaluating AI and machine learning algorithms at scale | 1,779 |
| A benchmark suite for unsupervised reinforcement learning agents, providing pre-trained models and scripts for testing and fine-tuning agent performance. | 335 |
| Provides pre-trained machine learning models for natural language processing tasks using Clojure and the clj-djl framework. | 0 |
| Develops and trains models for vision-language learning with decoupled language pre-training | 24 |
| This project integrates visual knowledge into large language models to improve their capabilities and reduce hallucinations. | 124 |
| Extending pretraining models to handle multiple modalities by aligning language and video representations | 751 |