It does sort of raise a Turing test-type question for exam marking. Here is the study result: "The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre." Now, is 'success' producing the same output as a human grader, or is success something else? If, for example, a teacher is supposed to be marking for content, but is instead responding subliminally to style, and the computer marks for style, and both human and computer return the same test score, is that a success? Or to put the same point another way: should we evaluate automated grading by comparing the results with human grading, or should we evaluate it based on elements we know empirically are present in the material being evaluated.
Today: 4 Total: 88 [Share]
] [