Via Libre’s paper has been selected to go to Prototype

Via Libre’s ‘A TOOL TO OVERCOME TECHNICAL BARRIERS FOR BIAS ASSESSMENT IN HUMAN LANGUAGE TECHNOLOGIES’ has been selected to go to the 6 month prototype phase.


Abstract of the paper: Automatic processing of language is becoming pervasive in our lives, often taking central roles in our decision making, like choosing the wording for our messages and mails, translating our readings, or even having full conversations with us. Word embeddings are a key component of modern natural language processing systems. They provide a representation of words that has boosted the performance of many applications, working as a semblance of meaning. Word embeddings seem to capture a semblance of the meaning of words from raw text, but, at the same time, they also distill stereotypes and societal biases which are subsequently relayed to the final applications. Such biases can be discriminatory.

It is very important to detect and mitigate those biases, to prevent discriminatory behaviors of automated processes, which can be much more harmful than in the case of humans because their of their scale. There are currently many tools and techniques to detect and mitigate biases in word embeddings, but they present many barriers for the engagement of people without technical skills. As it happens, most of the experts in bias, either social scientists or people with deep knowledge of the context where bias is harmful, do not have such skills, and they cannot engage in the processes of bias detection because of the technical barriers.

We have studied the barriers in existing tools and have explored their possibilities and limitations with different kinds of users. With this exploration, we propose to develop a tool that is specially aimed to lower the technical barriers and provide the exploration power to address the requirements of experts, scientists and people in general who are willing to audit these technologies.

Over the 6 month prototyping phase they will:

  • Provide a graphical interface inspired by some of the HCI ideas in the tool WordBias
  • Provide comparative visualizations that record the history of interactions with the prototype, allowing to compare:
    • variations in the extremes of the bias space
    • variations across different embeddings, diachronically (e.g., newspapers)
    • combinations of different spaces (i.e. intersectionality)
  • Host prototype in huggingface4 to import pre-trained embeddings and to offer our tool to the NLP community of practitioners that huggingface has.
  • Possibility to train word embeddings, given a corpus, provide metrics of reliability of the embedding word by word.
  • Show the following additional information about the words
    • frequency with respect to corpus size
    • concordances, context of occurrences
    • n most similar words
    • average similarity with n most similar words
  • Define metrics assesssing quality of word lists, based on their statistical properties.
  • Extend embeddings with n-grams as explained in above with multi-word expressions.
  • Suggest mitigation strategies that involve comparing different corpora or modifying the original corpora. For example, a corpus in Spanish could be made gender neutral before training word embeddings by using the neutral gender ‘e’.
  • Assess whether methodology for exploring biases could be applied to contextual embedding methods used in large language models
  • Usability studies for agile development that we describe below.
  • Integration with public policy practice.

They will also reach out to partners in the NLP community and throughout social science networks to define and refine their tool

To read the full paper:

To listen to the project be explained :