Personally I think grading is more of a categorization problem than a language problem, and so I would think of a large language model (LLM) as a blunt instrument for this sort of task. The proof, I think is that it's necessary for the instructor to write a clear rubric to even make it possible for an LLM to grade papers with any degree of reliability. And dependence on a rubric, of course, rules out the possibility that a response might be unexpectedly good, going beyond what was required in some way not predicted by the instructor, but in a way that would be recognized by a categorization system but not a language model. Anyhow, the paper David Wiley introduces is here (33 page PDF).
Today: 76 Total: 371 [Share]
] [