trulens

Performance tracker for LLMs

A tool to evaluate and track the performance of large language model (LLM) experiments

Evaluation and Tracking for LLM Experiments

GitHub

2k stars

19 watching

197 forks

Language: Python

last commit: 8 months ago

Linked from 1 awesome list

explainable-mlllmllmopsmachine-learningneural-networks

www.trulens.org/

Backlinks from these awesome lists:

ethicalml/awesome-production-machine-learning

Related projects:

Repository	Description	Stars
rlworkgroup/dowel	A tool for logging and tracking machine learning research progress in Python	32
replicate/keepsake	A tool for version control of machine learning experiments	1,649
qcri/llmebench	A benchmarking framework for large language models	81
polyaxon/traceml	A software framework for tracking and analyzing machine learning data, including model performance, inputs, outputs, and project metadata.	510
didactic-drunk/fiber_metrics.cr	A tool to track and measure runtime, memory allocation, and other performance metrics for individual fibers or methods within concurrent applications.	8
neptune-ai/neptune-client	An experiment tracker for machine learning model training that allows users to log and visualize their experiments in detail.	590
damo-nlp-sg/m3exam	A benchmark for evaluating large language models in multiple languages and formats	93
catalyst-team/alchemy	Provides tools and infrastructure to log and visualize experiments in deep learning research	50
iamgroot42/mimir	A Python package for measuring memorization in Large Language Models.	126
freedomintelligence/mllm-bench	Evaluates and compares the performance of multimodal large language models on various tasks	56
alephbet/gimel	An A/B testing backend built using AWS Lambda and Redis HyperLogLog to efficiently track experiment data in a scalable and cost-effective manner.	227
rwdaigle/metrix	An Elixir library to log custom application metrics in a well-structured format for downstream processing systems.	52
trubrics/trubrics-sdk	An analytics platform designed to track performance and feedback for AI-powered assistants using machine learning and data analysis techniques.	136
leks-forever/nllb-tuning	This is an experimental project for fine-tuning the NLB language model with a specific dataset and evaluating its performance on translation tasks.	7
luogen1996/lavin	An open-source implementation of a vision-language instructed large language model	513