“Gli Stereotipi in Italia” is a project on LanguageARC created to study stereotypes that occur in the Italian language. It has three different tasks entitled “Gli uomini non sanno lavare i piatti”, “Stereotipo o no?”, and “Stiamo parlando dell'Italia?”. In these tasks, participants are asked to interact with and make decisions about sentences containing stereotypes or biases against certain demographic groups in Italian.
“Gli Stereotipi in Italia” sources sentences from the CrowS-Pairs dataset, an English language dataset created to measure different kinds of social bias against certain demographics. It has more than 1500 examples of sentences that contain stereotypes regarding race, religion, gender/sexuality, age, and more. Each sentence is paired with another one, and the two create a minimal pair, differing by one aspect. For example, take the minimal pair, "Women love to shop" and "Men love to shop". The sentence "Women love to shop" is a common gender-based stereotype and the sentence "Men love to shop" is less commonly heard or used as a stereotype. Participants interact with and make judgements about the sentences from the CrowS-Pairs dataset in order to help collect data about bias for language models. For this project, the dataset was translated from English to Italian, and there are plans to create other projects in different languages as well.
This blog contains a short Q&A with Julian Bezançon, a developer of the “Gli Stereotipi in Italia” and the "Les Stéréotypes en Français" project, both of which are available for participation on LanguageARC.
What was the development process of “Gli Stereotipi in Italia” like?
It was very convenient as I already created a project on LanguageARC last year ("Les Stéréotypes en Français"). It took me something like 30 minutes to get it done.
Since both this project and “Les Stéréotypes en Français” are based around bias and stereotypes in language, did you take inspiration from it? If so, what are similar and different aspects of each project?
Both projects are linked in a way. Last year we extended an English Bias Dataset and we used LanguageARC for corrections and additions to the dataset. This year, we aim to do the same for several other languages. We are working with dozens of researchers, and some of them asked for the creation of a project in their language. They show me what they want me to do with their respective projects, but in the end they also take inspiration in "Les Stéréotypes en Français".
Can you explain how the “CrowS-Pairs” dataset is used in this project?
We have three tasks in this project. In the first task, we ask the users to create a sentence which expresses bias, so the data from CrowS-Pairs is not used. For the second task, participants are shown a CrowS-Pair sentence and asked to select the category of stereotype the sentence belongs to from a list of options. In the third task, which has to do with grammar corrections, we show a CrowS-Pair sentence and we ask for grammatical corrections if needed.
What do you envision participant data from “Gli Stereotipi in Italia” will be used for?
As we did with the French data from the “Les Stéréotypes en Français” project, we will collect corrections, submissions, and responses and use them to create less biased language models. We will also get new sentences created and add them to our dataset for Italian in conjunction with the sentences translated directly from the English CrowS-Pairs dataset. This is all part of our methodology to extend this dataset to languages other than English.
To participate in "Gli Stereotipi in Italia" and other citizen science language projects, head to the LanguageARC website.
Komentarze