Downes.ca ~ Stephen's Web ~ All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Elizabeth Clark, et al., arXiv.org, Dec 25, 2022
Commentary by Stephen Downes

No doubt teachers and instructors think they will be able to spot when students use AI to write their papers and assignments. Not so fast. "How well can evaluators detect and judge machine-generated text?" Not well, it turns out. "Without training, evaluators distinguished between GPT3- and human-authored text at random chance level." Even with training, reports this article (16 page PDF), the results are not good. Think you can do better? Try this game for yourself. Once you're satisfied, then take some time to carefully review this article (35 page PDF) on a comprehensive survey of threat models and detection methods for machine generated text.

Today: 7 Total: 104 [Direct link] [Share]

View full size