Abdeljalil El Majjodi - Data team
Abdelaziz Bounhar - Data team

AL Atlas: Moroccan Darija Pretraining

We present a comprehensive dataset for Moroccan darija, addressing the lack of resources for this widely spoken dialect. We detail our collection methodology, provide thorough data analysis, and demonstrate performance improvements in both masked and causal language models after training on this dataset.

Read more
Abdeljalil El Majjodi - Data team
Aymane El Firdoussi - Data team
Ihssane Nedjaoui - Data team

Darija Chatbot Arena: Making LLMs Compete in the Moroccan Dialect

We introduce Darija Chatbot Arena, an innovative platform designed to facilitate the comparison of responses from various Large Language Models (LLMs) on a diverse set of prompts in Darija, the Moroccan Arabic dialect.

Read more
Imane Momayiz - Data team lead
ao -
Ali Nirheche - data team
Choukrani - research team lead

TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation

We introduce TerjamaBench, an evaluation benchmark for English-Darija machine translation.

Read more