Automatic Grading

Posted in Business, Ethics, Philosophy, Politics, Technology, Universities & Colleges by Michael LaBossiere on April 26, 2013

When I learned that EdX had developed software that would instantly grade written work, my first reaction was one of skepticism. After all, while

spell-checkers work well and grammar checkers work sort of well, it seems unlikely that software could properly evaluate written work. My second reaction was that of hope-after all, I grind through hundreds of papers each year and automating that task would make my job much easier. This lead to my third reaction, namely worry regarding the implications of such software.

While my knowledge of programming is mostly obsolete, I do know enough about artificial intelligence to know that the current technology is most likely not up to the task of properly grading written work such as essays. After all, while checking such things as spelling and grammar can be automated relatively easily, properly assessing a written work would seem to require robust language comprehension-something that existing artificial intelligence can not do. Interestingly, in a letter about animals, Descartes argues that purely mechanical systems cannot engage in true language. While he was writing about animals, his view also applied to automatons and would now apply to computers. While Descartes might be proven wrong someday, I would suspect that day has yet to arrive.

Of course, it would be foolish of me to take my view to be certain. After all, I am not an expert on artificial intelligence and perhaps EdX has made an exceptional break through in the field. Naturally, the rational approach is to consider what the experts have to say about the matter and to consider the available evidence.

One expert who has been critical of such software is Les Perelman. In a detailed paper, he does a careful analysis of the effectiveness of the grading software. While the paper is somewhat technical, it does make a compelling case against the claim that such grading software is effective. In any case, readers can review the paper and assess his reasoning and evidence. Perelman is also well known for crafting nonsense that receives high marks from grading software. That this occurs is hardly surprising. After all, the grading software is obviously not actually capable of comprehending the essay-it is merely running it through a series of programmed evaluations and someone who knows how specific software works can create nonsense essays that a human reader would recognize as nonsense yet pass the programmed evaluations with flying colors. This sort of thing could be seen as a variation on the Turing test: being able to properly grade a written essay and distinguish it from cleverly crafted nonsense would be a passing mark for the software/hardware.

In regards to the matter of hope, the idea of automatic essay grading is appealing. Like many professors at teaching schools, I grade hundreds of essays each year. Unlike many professors, I get the graded work back to the students within a few days.  In most cases, I am sad to say, students merely look at the grade and ignore the feedback and comments. As such, an automatic grader would reduce my workload dramatically, allowing me more time to handle my usual 6-9 committees, being the unit facilitator and so on.

Also, I believe the software might encourage students to write more drafts. My students have to wait about 15-30 minutes for me to review a draft during my office hours or as long as a day if they drop the paper off at the end of the day. But, if a student could get instant feedback, they would have more time to revise the paper and hence might be more likely to do so. Or perhaps not.

As might be imagined, not all professors have my rapid turnaround time on drafts and papers (my students alway seem shocked when they get their work back so quickly). In such cases, automatic grading would be even more useful-rather than waiting days, weeks or even months a student could get instant feedback. There is also the fact that some professors do not provide any feedback beyond a grade on the work. If the software provide more than that, it could be rather useful to the students. There is also the practical point that even not-so-great software could still be better than the evaluation provided by some professors.

Of course, the usefulness of the software is contingent on how well it actually works. If it can be gamed by nonsense or does not actually assess the essays properly, then it would be little more than a gimmick. That said, even if it was limited in functionality, it could still prove useful. For example, I already use Blackboard’s Safeassign to check papers for plagiarism. While it does yield false positives and can miss some cases of plagiarism, it is still a useful tool. As such, the grading software might also serve as a useful tool for drafts and for a preliminary evaluation. However, I am still skeptical about the ability of software to assess written work properly.

My final response was concern about the implications of the software. While it might be suspected that I would be worried that such software could put me out of a job, that is not my main worry. While I would obviously not want to be unemployed because I was replaced by some code, I am well aware of the nature of technological advance and that automation can make certain jobs obsolete. If a program could do my job as well as me, it would be unreasonable of me to insist that I be kept on the payroll just because firing me would be bad for me personally. After all, the university is not there to give me a job.

My main concern is not that I would be replaced by an automatic equivalent or better (that is being replaced because the task no longer requires a human), my main concern is that I would be replaced by something inferior for the main purpose of saving money. In more general terms, my worry is not that progress will make the professorship obsolete, but that the grading software will be used to cut costs by providing students with something inferior (most likely without informing students of this fact).

It might be countered that such grading software could be combined with the massive online courses and thus produce fully automated education factories that could provide education to people who could otherwise not afford it. To use an analogy, the old model for universities would be a fine (or less fine) restaurant with chefs and the new model would be the fast food joint with food technicians.

I will admit that this does have considerable appeal. After all, bringing education to people at a low cost would have numerous advantages, such as allowing people who could otherwise not afford education to be able to acquire it.

Of course, there is still the obvious concern that the software would be used to sell an inferior product at the price of the premium product and also the concern that education could become a degree mill in which students just click their way to a diploma.

Having been in higher education for quite some time I can attest to the desire to make education more like a business. Being able to automate education like a factory would certainly be appealing to some (such as certain politicians and the folks who would sell or license the software and hardware). As might be expected, while I do believe that certain things can be automated (like grading T/F tests), education does not seem well suited to the factory model.

Another obvious concern is that automated education might not democratize education by allowing everyone low-cost access to higher education. It might very well create an even more extreme inequality than exists today. That is, the premier institutions would have human professors providing high quality education while the other schools, such as state schools, would have automated classes providing education to the masses.  While this sounds like a science-fiction scenario, it is actually well within the realm of possibility. I can attest, from my own experience, the push to standardize and automate education and the education factory is not many steps away from the model being strongly pushed today. This is not to say that the education factory will arrive soon or even at all. But it is likely enough that it is worth being concerned about.

3 Responses

  1. T. J. Babson said, on April 28, 2013 at 9:45 am

    If professors can buy software to grade papers, it is only a matter of time before students will be able to buy software to write papers.

    • Michael LaBossiere said, on April 30, 2013 at 6:18 am

      If the software is cheap enough it will replace the human essay sellers and some plagiarism. Of course, there is already software for generating post-modern junk.

  2. biomass2 said, on April 28, 2013 at 1:33 pm

    Summly. Summarizing content is a major achievement and a financially rewarding effort for young Mr. D’Aloisio. But let’s give this a bit more context . Descartes, if he could sit in on House committee meetings, listen to talk radio, watch 24-7 news coverage or surf the internet may want to reassess his argument. To reword Mike’s summary, “Human beings ‘cannot engage in true language’.

    Mechanical analysis of coherence and development of ideas in an article or essay is probably an achievable gleam in someone’s eye. But, developing a program that can distinguish between bullshit and substance is , I fear, a “consummation devoutly to be wished”. Perhaps we’ll achieve it in a galaxy far, far away.

