Democratizing NLP, this tool seeks to lower the barrier for experts and citizens to explore Latin American language models…and debias the results
https://huggingface.co/spaces/vialibre/edia
Data driven systems are increasingly being used to supplement and in some cases, take over decision making processes. This can compromise fundamental rights and lead to erroneous decision making. However, auditing data-driven systems often requires technical skills that are beyond the capabilities of many relevant actors. The technical barrier has become a major obstacle to engaging experts and communities in assessing the behavior of automated systems. Technicalities not only hinder audits but also create obscurity, making it challenging for individuals from other domains to understand the capabilities and limitations of these tools. This lack of understanding hinders the development of public policies that consider the impact of these technologies.
The NLP community has increasingly expressed concerns about bias and stereotypes present in models and how these biases can have practical implications, such as in personalized job advertisements. Several studies have identified harmful associations and biases within word representations learned from corpora. These biases can lead to invisibilization, self-censorship, or act as deterrents. Addressing these concerns, various techniques for measuring and mitigating bias in word embeddings and language models have been proposed.
Dealing with biases in AI systems requires more than just technical solutions. Fairness necessitates a nuanced understanding of complex sociological constructs, often involving phenomena that are not well understood or formalized. Formalizing such constructs for computational treatment is challenging and requires the involvement of domain experts such as sociologists, linguists, physicians, psychologists, depending on the application domain. However, fairness metrics and frameworks often rely on technical instruments that hinder understanding and the involvement of individuals without extensive technical education, including programming skills.
In this project, a methodology is proposed to facilitate the exploration of biases in word embeddings, with a specific focus on the needs of the Latin American region. The goal is to enable domain experts in Latin America to independently conduct these analyses, without relying on interdisciplinary teams or extensive training, which may not always be available.
Via Libre, a non-profit civil organization established in Córdoba, Argentina, developed a tool alongside the National University of Cordoba and the Faculty of Mathematics, Astronomy, Physics and Computing that is specially aimed to lower technical barriers to provide the exploration power to address requirements of experts, scientists and people in general who are interested and willing to audit these technologies.