Generative artificial intelligence is transforming the way scholars draft, revise, and publish. Yet, academia lacks a systematic way to measure these shifts and risks relying on anecdotal evidence in evaluating whether AI elevates or erodes scholarly standards. This Comment draws on a pre-registered, three-day field experiment that addressed this lack of measurement by pairing twenty-two early-career researchers with and without AI tools to improve scholarly manuscripts for journal submission. However, the AI models used in the field experiment are already outdated and outperformed by more powerful reasoning models, situating the results as a snapshot in time. This Comment calls for recurring events with a similar set of evaluation criteria to combine the results in a publicly available dataset. Monitoring the quality of researcher-AI collaboration is necessary if academia wants to keep track of AIâs rapid impact on research practice.
- Leonard Wendering
- Marc Ratkovic
- Niki Scaplehorn