At the Global Launch of Feminist AI Research Network (f<a+i>r) on January 26, 2022, Emily Denton, Research Scientist at Google and Raejetse Sefala, Research Fellow at Distributed AI Research Institute (DAIR) answered the question: What do benchmark datasets mean for Feminist AI, and where do we go from here in our collective work?
What do benchmark datasets mean for Feminist AI, and where do we go from here in our collective work?
At our Global Launch, Emily Denton, Research Scientist at Google and
co-author of NeuroIPS 2021 paper Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research presented their work, and engaged in conversation with Raejetse Sefala of DAIR (whose talk on Spatial Apartheid is posted separately).
“Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine learning subcommunities. In this paper, we dig into these dynamics. We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity/access within the field.”