Research

benchmarkevaluationmalayalamDraft

MalayalamCultureBench: A Benchmark for Evaluating LLM Understanding of Kerala's Art, History, and Traditions

Introduces a new evaluation benchmark covering Kerala-specific culture (art forms, festivals, oral traditions, history), testing existing LLMs against the Mikav fine-tuned model to quantify cultural-awareness gaps.

Hrudu Shibu·5 Jul 2026

datasetcorpusmalayalamDraft

Building an Open Malayalam Culture Corpus: Collection, Cleaning, and Licensing of Low-Resource Heritage Data

Documents the methodology for sourcing, cleaning, and licensing Malayalam text and cultural heritage data (manuscripts, oral history, festival/art records) into an open, reusable dataset — addressing IP ownership and low-resource data challenges.

Hrudu Shibu·5 Jul 2026

human-in-the-loopverificationtrustDraft

Community-Verified AI: A Human-in-the-Loop Framework for Preserving Regional Cultural Knowledge

Proposes a verification methodology pairing domain experts (cultural institutions, practitioners) with AI-generated content to reduce hallucination risk and ensure trustworthy representation of niche cultural knowledge.

Hrudu Shibu·5 Jul 2026

system-paperarchitectureopen-sourceDraft

Mikav: An Open-Source AI Copilot Bridging Cultural Heritage and Creative Entrepreneurship in Kerala

Presents the full Mikav system architecture (dataset → model → copilot → dev platform), deployment approach, and a case study from the SparkX cohort showing real-world usage and outcomes.

Hrudu Shibu·5 Jul 2026

fine-tuningllmmalayalamDraft

Fine-Tuning Open LLMs for Native Malayalam Cultural Understanding: A Comparative Study

Compares fine-tuning approaches (Llama, Qwen, Gemma) on the Malayalam culture corpus, evaluating language fluency and cultural-knowledge accuracy against base/baseline models.

Hrudu Shibu·5 Jul 2026