indosum
Indonesian Summarization Benchmark
Provides a benchmark dataset and tools for training text summarization models in the Indonesian language.
A benchmark dataset for Indonesian text summarization.
77 stars
7 watching
15 forks
Language: Python
last commit: almost 6 years ago
Linked from 1 awesome list
indonesianindonesian-languagenatural-language-processingtext-summarization
Related projects:
Repository | Description | Stars |
---|---|---|
| A comprehensive collection of natural language understanding resources and pre-trained models for Indonesian language. | 564 |
| A CLI-based dictionary application written in C++ that maps Indonesian to English words. | 6 |
| Implementing a CLI arguments parser to process input in various formats | 5 |
| A tool to parse Indonesian date and time descriptions into Unix epoch timestamps. | 11 |
| A natural language processing toolkit for the Indonesian language. | 19 |
| A parallel corpus of Asian languages with linguistic annotations and data formats for natural language processing research. | 49 |
| A Laravel package providing geographical data of Indonesia's administrative regions | 252 |
| A repository of linguistic data for Indonesian words categorized as either standard or non-standard | 29 |
| Demonstrates word embedding in Indonesian language using pre-trained Word2vec models | 20 |
| Evaluates and benchmarks large language models' video understanding capabilities | 121 |
| A benchmarking project comparing the performance of different programming languages and their compiled outputs in various formats. | 52 |
| A Python port of an Indonesian stemmer library, reducing inflected words to their base form. | 337 |
| An article classification dataset created from news articles scraped from Prachathai.com with multiple benchmark models for multi-label classification | 16 |
| A collection of NLP papers and resources for Bahasa Indonesia, including tools and software for text processing tasks such as summarization, parsing, part-of-speech tagging, stemming, and word sense disambiguation. | 186 |
| Measures the understanding of massive multitask Chinese datasets using large language models | 87 |