Skip to main navigation Skip to search Skip to main content

Comparing Human-Only, AI-Assisted, and AI-Led Teams on Assessing Research Reproducibility in Quantitative Social Science

  • Abel Brodeur
  • , David Valenta
  • , Alexandru Marcoci
  • , Juan Pablo Aparicio
  • , Derek Mikola
  • , Bruno Barbarioli
  • , Rohan Alexander
  • , Lachlan Deer
  • , Tom Stafford
  • , Lars Vilhuber
  • , Gunther Bensch
  • , Mohamed Abdelhady
  • , Yousra Abdelmoula
  • , Ghina Abdul Baki
  • , Tomás Aguirre
  • , Sriraj Aiyer
  • , Shumi Akhtar
  • , Farida Akhtar
  • , Melle R. Albada
  • , Micah Altman
  • David Angenendt, Zahra Arjmandi Lari, Jorge Armando De León Tejada, Igor Asanov, Anastasiya-Mariya Asanov Noha, Rebecca Ashong, Tobias Auer, Francisco J. Bahamonde-Birke, Bradley J. Baker, Söhnke M. Bartram, Dongqi Bao, Lucija Batinovic, Tommaso Batistoni, Monica Beeder, Louis-Philippe Beland, Carsten Bienz, Christ Billy Aryanto, Cylcia Bolibaugh, Carl Bonander, Ramiro Bravo, Katherine Brennan, Egor Bronnikov, Stephan Bruns, Nino Buliskeria, Sara Caicedo-Silva, Andrea Calef, Solomon Caulker, Simonas Cepenas, Arthur Chatton, Zirou Chen, Ngozi Chioma Ewurum, Anda-Bianca Ciocîrlan, Felix J. Clouth, Jason Collins, Nikolai Cook, Cesar Cornejo, João Craveiro, Jing Cui, Niveditha Chalil Vayalabron, Christian Czymara, Carlos Daniel Bermúdez Jaramillo, Hannes Datta, Lien Denoo, Arshia Dhaliwal, Nency Dhameja, Elodie Djemai, Erwan Dujeancourt, Uğurcan Dündar, Thibaut Duprey, Yasmine Eissa, Youssef El Fassi, Ismail El Fassi, Keaton Ellis, Ali Elminejad, Mahmoud Elsherif, Aysil Emirmahmutoglu, Giulian Etingin-Frati, Emeka Eze, Jan Fabian Dollbaum, Jan Feld Victoria, Andres Felipe Rengifo Jaramillo, Guidon Fenig, Victoria Fernandes, Lenka Fiala, Lukas Fink, Sara Fish, Jack Fitzgerald, Rachel Joy Forshaw, Alexandre Fortier-Chouinard

Research output: Working paperDiscussion paper

Abstract

This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, AI-assisted teams and teams whose task was to minimally guide an AI to conduct reproducibility checks (the "AI-led" approach). Findings reveal that when working independently, human teams matched the reproducibility success rates of teams using AI assistance, while both groups substantially outperformed AI-led approaches (with human teams achieving 57 pp higher success rates than AI-led teams). Human teams found significantly more major errors compared to both AI-assisted teams and AI-led teams. AI-assisted teams demonstrated an advantage over more automated approaches, detecting 0.4 more major errors per team than AI-led teams, though still significantly fewer than human-only teams. Finally, both human and AI-assisted teams significantly outperformed AI-led approaches in both proposing and implementing comprehensive robustness checks.
Original languageEnglish
PublisherIZA Institute of Labor Economics
DOIs
Publication statusPublished - 31 Jan 2025

Fingerprint

Dive into the research topics of 'Comparing Human-Only, AI-Assisted, and AI-Led Teams on Assessing Research Reproducibility in Quantitative Social Science'. Together they form a unique fingerprint.

Cite this