llm_dataset_inference
Dataset checker
Detects whether a given text sequence is part of the training data used to train a large language model.
Official Repository for Dataset Inference for LLMs
23 stars
1 watching
4 forks
Language: Jupyter Notebook
last commit: 8 months ago Related projects:
Repository | Description | Stars |
---|---|---|
| A Python package for measuring memorization in Large Language Models. | 126 |
| A command-line interface to generate textual datasets with Large Language Models | 293 |
| Measures the performance of deep learning models in various deployment scenarios. | 1,256 |
| An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification | 16 |
| A curated list of Natural Language Processing datasets used to train and evaluate NLP models. | 919 |
| A tool for fact-checking LLM outputs with self-ask using prompt chaining | 289 |
| A project providing optimized stacks for fine-tuning and inference of large language models, focusing on low-latency and high-throughput performance. | 525 |
| Pre-training large language models on scientific data for downstream applications | 12 |
| Compiles bias evaluation datasets and provides access to original data sources for large language models | 115 |
| A collection of Urdu language datasets for various NLP tasks and applications | 71 |
| Provides code samples and notebooks to download, read, and analyze Goodreads datasets for research purposes. | 252 |
| Analyzes GPS data to infer probabilistic schedules from transit vehicle movements | 10 |
| A collection of data for evaluating Chinese machine reading comprehension systems | 419 |
| A collection of curated environmental datasets from US LTER sites, designed for teaching and training in data science. | 48 |
| A tool to evaluate and track the performance of large language model (LLM) experiments | 2,233 |