LM_Memorization

Content Extractor

A tool to extract memorized content from large language models like GPT-2 by analyzing their training data

Training data extraction on GPT-2

GitHub

175 stars
7 watching
33 forks
Language: Python
last commit: almost 2 years ago

Related projects:

Repository Description Stars
ftramer/steal-ml An implementation of extraction attacks against Machine Learning models offered by Cloud-based services 344
eyurtsev/kor Extracts structured data from unstructured text using large language models 1,629
recrm/archivetools A collection of tools for extracting and analyzing data from web archives 69
iamgroot42/mimir Measures memorization in Large Language Models (LLMs) to detect potential privacy issues 121
ir193/amextractor A tool to extract physical memory from Android devices without kernel source code or LKM support. 11
kost/memdump A tool to extract and display the contents of a system's physical memory 12
eset-la/lord-of-the-strings A tool to extract and classify relevant strings from binary files 9
cognesy/instructor-php A PHP library that simplifies the integration of Large Language Models into applications by providing structured data extraction and validation. 218
halpomeranz/lmg A tool for capturing and analyzing Linux memory 264
gamallo/galextra A multi-language term extractor that uses morphosyntax tagging and filtering to identify multi-word terms from plain text input. 2
bfelbo/deepmoji A deep learning model for analyzing sentiment and emotion in text based on emojis. 1,518
os6sense/defmemo A macro that memoizes the results of functions with identical signatures 33
monarch-initiative/ontogpt An LLM-based tool for extracting structured information from text with ontology-based grounding. 609
knowledgecaptureanddiscovery/somef A tool to extract relevant software information from readme files. 44
yomurb/yomu A Ruby library for extracting text and metadata from various file formats. 499