indosum

Indonesian Summarization Benchmark

Provides a benchmark dataset and tools for training text summarization models in the Indonesian language.

A benchmark dataset for Indonesian text summarization.

GitHub

77 stars

7 watching

15 forks

Language: Python

last commit: over 7 years ago

Linked from 1 awesome list

indonesianindonesian-languagenatural-language-processingtext-summarization

github.com/kata-ai/indosum

Backlinks from these awesome lists:

keon/awesome-nlp

Related projects:

Repository	Description	Stars
indonlp/indonlu	A comprehensive collection of natural language understanding resources and pre-trained models for Indonesian language.	564
andriawan/andkamus	A CLI-based dictionary application written in C++ that maps Indonesian to English words.	6
ivoputzer/cli-args-parser-kata	Implementing a CLI arguments parser to process input in various formats	5
ariya/tebakmasa	A tool to parse Indonesian date and time descriptions into Unix epoch timestamps.	11
kangfend/bahasa	A natural language processing toolkit for the Indonesian language.	19
matbahasa/talpco	A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research.	49
azishapidin/indoregion	A Laravel package providing geographical data of Indonesia's administrative regions	252
lantip/baku-tidak-baku	A repository of linguistic data for Indonesian words categorized as either standard or non-standard	29
galuhsahid/indonesian-word-embedding	Demonstrates word embedding in Indonesian language using pre-trained Word2vec models	20
pku-yuangroup/video-bench	Evaluates and benchmarks large language models' video understanding capabilities	121
damoebius/haxebench	A benchmarking project comparing the performance of different programming languages and their compiled outputs in various formats.	52
har07/pysastrawi	A Python port of an Indonesian stemmer library, reducing inflected words to their base form.	337
pythainlp/prachathai-67k	An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification	16
sastrawi/nlp-bahasa-indonesia	A collection of NLP papers and resources for Bahasa Indonesia, including tools and software for text processing tasks such as summarization, parsing, part-of-speech tagging, stemming, and word sense disambiguation.	186
felixgithub2017/mmcu	Measures the understanding of massive multitask Chinese datasets using large language models	87