indosum

Indonesian Summarization Benchmark

Provides a benchmark dataset and tools for training text summarization models in the Indonesian language.

A benchmark dataset for Indonesian text summarization.

GitHub

76 stars
7 watching
15 forks
Language: Python
last commit: over 5 years ago
Linked from 1 awesome list

indonesianindonesian-languagenatural-language-processingtext-summarization

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
indonlp/indonlu A comprehensive collection of natural language understanding resources and pre-trained models for Indonesian language. 556
andriawan/andkamus A CLI-based dictionary application written in C++ that maps Indonesian to English words. 6
ivoputzer/cli-args-parser-kata Implementing a CLI arguments parser to process input in various formats 5
ariya/tebakmasa A tool to parse Indonesian date and time descriptions into Unix epoch timestamps. 10
kangfend/bahasa A natural language processing toolkit for the Indonesian language. 19
matbahasa/talpco A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. 49
azishapidin/indoregion A Laravel package providing geographical data of Indonesia's administrative regions 249
lantip/baku-tidak-baku A repository of linguistic data for Indonesian words categorized as either standard or non-standard 29
galuhsahid/indonesian-word-embedding Demonstrates word embedding in Indonesian language using pre-trained Word2vec models 20
pku-yuangroup/video-bench Evaluates and benchmarks large language models' video understanding capabilities 117
damoebius/haxebench A benchmarking project comparing the performance of different programming languages and their compiled outputs in various formats. 52
har07/pysastrawi A Python port of an Indonesian stemmer library, reducing inflected words to their base form. 336
pythainlp/prachathai-67k An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification 16
sastrawi/nlp-bahasa-indonesia A collection of NLP papers and resources for Bahasa Indonesia, including tools and software for text processing tasks such as summarization, parsing, part-of-speech tagging, stemming, and word sense disambiguation. 186
felixgithub2017/mmcu Evaluates the semantic understanding capabilities of large Chinese language models using a multimodal dataset. 87