MalayalamCultureBench: A Benchmark for Evaluating LLM Understanding of Kerala's Art, History, and Traditions
Introduces a new evaluation benchmark covering Kerala-specific culture (art forms, festivals, oral traditions, history), testing existing LLMs against the Mikav fine-tuned model to quantify cultural-awareness gaps.
Abstract
We introduce MalayalamCultureBench, a novel evaluation benchmark designed to measure LLM understanding of Kerala's cultural heritage. The benchmark covers art forms (Kathakali, Theyyam, Mohiniyattam), festivals (Onam, Thrissur Pooram, Vishu), oral traditions, and historical events. We evaluate existing multilingual LLMs and the Mikav fine-tuned model to quantify cultural-awareness gaps.
1. Introduction
Existing NLP benchmarks focus on general knowledge or Western cultural contexts. There is no standardised benchmark for evaluating how well language models understand regional Indian cultures. MalayalamCultureBench fills this gap for Kerala.
2. Benchmark Design
2.1 Categories
- Art Forms (Kathakali, Theyyam, Mohiniyattam, Kalaripayattu, etc.)
- Festivals (Onam, Vishu, Thrissur Pooram, Attukal Pongala, etc.)
- History (Travancore, Cochin Kingdom, Kerala Renaissance, etc.)
- Oral Traditions (folklore, proverbs, folk songs)
- Geography and Ecology (Western Ghats, backwaters, biodiversity)
2.2 Question Types
- Multiple choice
- Open-ended generation
- True/false with explanation
- Fill-in cultural context