Les Stéréotypes en Français is a project on LanguageARC that collects data on how stereotypes occur in the French language. This blog post reports on the development and reception of the project on LanguageARC.
Since there were not many resources available for studying bias, fairness, and social impact in languages other than English, developers of Les Stéréotypes en Français decided to translate resources originally developed for English into French. They used citizen science to achieve this goal and created three tasks on LanguageARC where participants could assist with translating existing English resources into French (this included evaluating existing translations and creating their own), categorizing French sentences that contained bias or stereotype based on the focus of the sentence (gender, race, age, sexual orientation, etc.) and creating complementary resources in native French. “We needed to collect examples of stereotypes and biases from French culture and language, and LanguageARC seemed like a good place to do so since we could utilize citizen science and participant contributions from all over the world. These stereotypes were then used to evaluate BERT (Bidirectional Encoder Representations from Transformers) models in French”, says Kären Fort, one of the developers behind Les Stéréotypes en Français.
The first task, in which participants decided if a given French sentence seems grammatically correct, attracted 84 participants who generated over 2,000 annotations in French. The second task, in which participants categorized French sentences containing biases or stereotypes, had 60 participants and yielded about 3,000 assessments of stereotypes. In the third task of the project, participants were asked to create their own French sentences containing biases or stereotypes in specific categories such as gender, age, ethnicity or socio-economic status. To date, the task has produced over 300 sentences from 47 participants.
The "French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English”, paper by Les Stéréotypes en Français developers gives several suggestions for creating future studies to analyze bias in languages. When translating resources from one language to another, a variety of translation techniques should be used to obtain sentences that are reasonable in the target language. Les Stéréotypes en Français developers relied on literal, word for word translation wherever it was possible, but also used translation methods such as transposition, modulation, equivalence and adaptation to achieve meaningful translations in the target language. Additionally, developers of Les Stéréotypes en Français suggest that future studies of bias and stereotypes should use manual (human) translation rather than machine translation due to biases that may exist in current language models. It is important for careful translation and evaluation of sentences to be done in these kinds of studies. “Hopefully [these studies] help identify stereotyped biases in language models, which can help us evaluate models along a more ethical line”, says Kären Fort.
To participate in Les Stéréotypes en Français and other citizen science language projects, head to the LanguageARC website.
Find LanguageARC on Facebook, Twitter, Instagram, YouTube. Look forward to more blog updates in the future.
Sources
Névéol, A., Dupont, Y., Bezançon, J., & Fort, K. (2022, April 4). French Crows-Pairs: Extending
a Challenge Dataset for Measuring Social Bias in Masked Language Models to a Language
Other Than English.
Comentários