LawBench

Legal model evaluator

Evaluates the legal knowledge of large language models using a custom benchmarking framework.

Benchmarking Legal Knowledge of Large Language Models

GitHub

267 stars
7 watching
39 forks
Language: Python
last commit: about 1 year ago
benchmarkchatgptlawllm

Related projects:

Repository Description Stars
open-compass/vlmevalkit A toolkit for evaluating large vision-language models on various benchmarks and datasets. 1,343
freedomintelligence/mllm-bench Evaluates and compares the performance of multimodal large language models on various tasks 55
mlabonne/llm-autoeval A tool to automate the evaluation of large language models in Google Colab using various benchmarks and custom parameters. 558
qcri/llmebench A benchmarking framework for large language models 80
liuhc0428/law-gpt A Chinese law-focused conversational AI model designed to provide reliable and professional legal answers. 1,054
andrewzhe/lawyer-llama An AI model trained on legal data to provide answers and explanations in Chinese law 851
obss/jury A comprehensive toolkit for evaluating NLP experiments offering automated metrics and efficient computation. 188
iclrandd/blackstone Develops an NLP pipeline and model for processing long-form legal text 637
open-compass/mmbench A collection of benchmarks to evaluate the multi-modal understanding capability of large vision language models. 163
oeg-upm/lubm4obda Evaluates Ontology-Based Data Access systems with inference and meta knowledge benchmarking 4
openai/simple-evals A library for evaluating language models using standardized prompts and benchmarking tests. 1,939
siat-nlp/hanfei Develops and trains a large-scale, parameterized model for legal question answering and text generation 98
maluuba/nlg-eval A toolset for evaluating and comparing natural language generation models 1,347
openlmlab/gaokao-bench An evaluation framework using Chinese high school examination questions to assess large language model capabilities 551
mlgroupjlu/llm-eval-survey A repository of papers and resources for evaluating large language models. 1,433