A Review of “A Machine Learning Approach For Identification of Thesis and Conclusion Statements in Student Essays”Posted: June 6, 2013
A Review of “A Machine Learning Approach For Identification of Thesis and Conclusion Statements in Student Essays”
I’ve become quite interested in the idea of machines grading papers ever since I read the New York Times Article Dan posted in the group library: “New test for Computers: Grading Essays at the College level.” For now I am just going to concern myself with the article in my title, but I am working on a much larger piece which combines several scholarly articles as well as a few editorials, on an educational issue that I feel will become increasingly relevant as technology expands: grading machines.
This article is interesting for several reasons, but mostly because it tests the abilities of human-markers against machine-markers, which is after all the most important issue when determining the efficacy and usefulness of these machines. Can these machines pick out those things that produce an effective piece of writing? The article defines what it means by effective writing, which I believe is an adequate but unfinished definition: “The literature in the teaching of writing suggests that invention, arrangement and revision in essay writing must be developed in order to produce effective writing. Stated in practical terms, students at all levels, elementary school through post-secondary education, can benefit from practice applications that give them an opportunity to work on discourse structure in essay writing.” I think we can mostly agree that if a machine can fulfill these requirements, that while imperfect, it is headed in the right direction.
So how well do the machines in this experiment perform these functions? Firstly, it is important to look at what it is the machines are being asked to do. In a broad sense they are being asked to identify the thesis and conclusion statements in a few hundred student essays. But the greater goal is to have them outperform a positional algorithm; this would show evidence that the machines can not only recognize specific examples input into them, but can also apply knowledge based on those examples.
The positional algorithm pertains to how a computer marks an essay based on length and position of words and paragraphs: “Essay length is highly correlated with human or machine scores (i.e., the longer the essay, the higher the score). Similarly, the position of the text in an essay is highly related to particular discourse elements. Therefore, we computed a positional label for the thesis and conclusion discourse categories. The method outlined in Table II was used for computing baselines reported in a later section” (462). The computing baselines for the positional algorithm are as follows, where P=paragraph:
For thesis statements: (1)# of P=3 or more all text in P 1, excluding the first sentence. (2) # of P=2 or more select all text in the first P. (3) # of P=1 Select nothing
For conclusion statements: (1) # of P=3 or more all text in final P (2) # of P=2 or more select all text in final P (3) # of P=1 select nothing.
The Results: ” “the performance of both discourse-based systems exceeds that of the positional algorithm, with the exceptions of the topic-independent system, PIC, for identification of thesis statements” (465).
“For identification of conclusion statements, the topic-dependent and topic independent systems have overall higher agreement than for thesis statements” (465)
“Thesis statements are more difficult to model as is apparent when we compare system performance for thesis and conclusion statements” (465).
“”Overall, the results in this study indicate that it is worth continuing research using machine learning approaches for this task, since they clearly outperform the positional baseline algorithm” (465).
The machines are better at identifying conclusion and thesis statements than the positional baseline algorithm, but they are not as effective as the human markers. However the machines can do this process much faster than the human markers, providing almost instant feedback. What we see here, I think, is that machines are helpful when we want to identify specific discourse elements related to writing: i.e. grammar, thesis and conclusion statements, punctuation etc., Machines handle the mechanical aspects of writing quite well. What I have been finding, however, is that machines are notoriously poor at dealing with the creative aspects of writing, including subversion of writing rules.
My larger blog will focus on a synthesis of the creative and mechanical aspects of writing and the pros and cons of machine grading that goes along with those. Specifically, I will look at how a machine might deal with some of the more unusual writing pieces the unessay is likely to produce. Can a machine ever be relied upon to mark something that bends the rules of writing for a purpose?
Burstein, Jill, and Daniel Marcu. “A Machine Learning Approach for Identification of Thesis and Conclusion Statements in Student Essays.” Computers and the Humanities 37.4 (2003): 455–467. JSTOR. Web. 31 May 2013