Sight-Beyond-Text

Multimodal model trainer

An implementation of a multimodal LLM training paradigm to enhance truthfulness and ethics in language models

[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

GitHub

19 stars

2 watching

1 forks

Language: Python

last commit: over 2 years ago

ai-alignmentalignmentllama2llavallmmllmvicunavision-languagevlm

Related projects:

Repository	Description	Stars
mlpc-ucsd/bliva	A multimodal LLM designed to handle text-rich visual questions	270
ucsc-vlaa/vllm-safety-benchmark	A benchmark for evaluating the safety and robustness of vision language models against adversarial attacks.	72
vpgtrans/vpgtrans	Transfers visual prompt generators across large language models to reduce training costs and enable customization of multimodal LLMs	270
ailab-cvc/seed	An implementation of a multimodal language model with capabilities for comprehension and generation	585
lyuchenyang/macaw-llm	A multi-modal language model that integrates image, video, audio, and text data to improve language understanding and generation	1,568
aidc-ai/ovis	An MLLM architecture designed to align visual and textual embeddings through structural alignment	575
mbzuai-oryx/groundinglmm	An end-to-end trained model capable of generating natural language responses integrated with object segmentation masks for interactive visual conversations	797
pleisto/yuren-baichuan-7b	A multi-modal large language model that integrates natural language and visual capabilities with fine-tuning for various tasks	73
llava-vl/llava-plus-codebase	A platform for training and deploying large language and vision models that can use tools to perform tasks	717
vishaal27/sus-x	This is an open-source project that proposes a novel method to train large-scale vision-language models with minimal resources and no fine-tuning required.	94
csuhan/onellm	A framework for training and fine-tuning multimodal language models on various data types	601
salt-nlp/llavar	An open-source project that enhances visual instruction tuning for text-rich image understanding by integrating GPT-4 models with multimodal datasets.	259
alpha-vllm/wemix-llm	An LLaMA-based multimodal language model with various instruction-following and multimodal variants.	17
bobazooba/xllm	A tool for training and fine-tuning large language models using advanced techniques	387
neulab/pangea	An open-source multilingual large language model designed to understand and generate content across diverse languages and cultural contexts	92