Sight-Beyond-Text
Multimodal model trainer
An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
19 stars
2 watching
1 forks
Language: Python
last commit: over 1 year ago ai-alignmentalignmentllama2llavallmmllmvicunavision-languagevlm
Related projects:
Repository | Description | Stars |
---|---|---|
mlpc-ucsd/bliva | A multimodal LLM designed to handle text-rich visual questions | 270 |
ucsc-vlaa/vllm-safety-benchmark | A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks. | 72 |
vpgtrans/vpgtrans | Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs | 270 |
ailab-cvc/seed | An implementation of a multimodal language model with capabilities for comprehension and generation | 585 |
lyuchenyang/macaw-llm | A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation | 1,568 |
aidc-ai/ovis | An MLLM architecture designed to align visual and textual embeddings through structural alignment | 575 |
mbzuai-oryx/groundinglmm | An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations | 797 |
pleisto/yuren-baichuan-7b | A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks | 73 |
llava-vl/llava-plus-codebase | A platform for training and deploying large language and vision models that can use tools to perform tasks | 717 |
vishaal27/sus-x | This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required. | 94 |
csuhan/onellm | A framework for training and fine-tuning multimodal language models on various data types | 601 |
salt-nlp/llavar | An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets. | 259 |
alpha-vllm/wemix-llm | An LLaMA-based multimodal language model with various instruction-following and multimodal variants. | 17 |
bobazooba/xllm | A tool for training and fine-tuning large language models using advanced techniques | 387 |
neulab/pangea | An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts | 92 |