Brazilian researchers found that the algorithms of the Google Translate service are biased when translating sentences from a language without a grammatical category of the genus. When translating several thousand sentences from 12 such languages into English, it turned out that technical professions are much less likely to refer to women than occupations in health care. In the preprint, published on arXiv, it is also reported that the distribution of representatives of a certain gender in the professions does not correspond to real statistics of employment.
Scientists from the Federal University of Rio Grande do Sul, led by Luis Lamb, selected 12 languages that do not have a grammatical category of the genus (Hungarian, Finnish, Swahili, Yoruba, Armenian and Estonian among them) format “X is a Y”, where X – the third person pronoun, and Y – the noun expressing the profession. In all selected languages, the third person pronoun is expressed by a single word (gender neutral): for example, in Estonian and “he”, and “she” is translated as “ta”, and in Hungarian – “ő”. The chosen nouns were also sterile: among them were such professions as “doctor”, “programmer” and “wedding organizer”. In total, researchers used 1019 professions from 22 different categories. The sentences were translated into English.
Researchers noted that the sentence with the unmentioned genus Google Translate translates in different ways: for example, the phrase “ő egy ápoló” (“he / she is a nurse / nurse”) the service translated as “she is a nurse”, but “ő egy tudós “(” He / she is a scientist “) as” he is a scientist “.
With the work of Google Translate, scientists found a slight deviation towards certain professions: for example, the translator referred to representatives of technical professions to the masculine gender in 71 percent of cases, and to the female one in four percent (in other cases to the middle genus). When using occupations from the health care sector, the female gender appeared in 23 percent of cases, and the male genotype in 49 percent.
The resulting distribution of occupations by genus pronoun was then compared to the actual figures provided by the Bureau of Labor Statistics . It turned out that Google Translate really prejudge and does not reflect the real distribution of representatives in the profession (at least in the US).Of course, the racial and gender bias that occurs when machine learning algorithms work is not due to the fault of the developers, but because of the characteristics of the training sample. However, they can also be used for good: for example, recently with the help of the method of gender representation of words, scientists on an example of a large number of texts were able to study how the attitude towards women and Asians changed over time. Nevertheless, the authors of this work insist on the use of special algorithms that would reduce such bias to a minimum: for example, the simplest thing is to include for random languages a random choice of a pronoun in the translation.
The method of delivering neural networks from sexism last year was suggested by American scientists: with the help of limitations imposed on the work of the image recognition algorithm, bias can be reduced by almost 50 percent.