Lex.llm

Lex.llm – Enhancing Danish Cultural and Factual Alignment in Language Models

The Lex.llm project brings together Center for Humanities Computing at Aarhus University and Lex.dk, Denmark’s National Encyclopaedia, in a joint effort to advance the factual accuracy and cultural sensitivity of Danish language models. By combining cutting-edge AI development with a strong foundation in curated Danish knowledge, Lex.llm aims to create a virtual assistant capable of providing accurate, context-sensitive information to diverse user communities.

Whilst large language models have achieved remarkable capabilities, they often misalign with local cultural values and factual standards, particularly in smaller language contexts such as Danish. Lex.llm addresses this challenge by fine-tuning, evaluating, and deploying a Danish conversational language model, ensuring the appropriate representation of Danish heritage, history, and societal values. Through a participatory development approach involving both knowledge editors and learners, the project seeks not only to enhance existing digital resources but also to pioneer methodologies for culturally responsible AI development.

Role of Center for Humanities Computing

The Lex.llm project is a cooperative initiative that Center for Humanities Computing is developing in close collaboration with Lex.dk’s editorial and research teams.

CHC is responsible for:

Reimagining how modern AI technology can help build a friendly and useful natural language interface to Lex' extensive corpus of factual and historical knowledge.
Researching how source attribution is challenged by AI generated content and developing new theories and tools for aiding in critical source awareness for Lex' users.
Investigating the landscape of open and responsible AI technology in order to create alternative, effective tech stacks to those provided by Big Tech companies.
Developing specialised Danish language models grounded in factual accuracy and cultural appropriateness
Implementing advanced model alignment techniques, including supervised fine-tuning, reinforcement learning from AI feedback, and Retrieval-Augmented Generation (RAG)
Building an evaluation framework that integrates benchmark testing with user-centred feedback loops
Supporting researchers, editors, and learners in data exploration, model evaluation, and participatory model refinement

Through this work, CHC enables the project to move beyond generic LLM development and towards creating language technologies tailored to Danish linguistic, cultural, and factual standards.