This week I set out to test a hypothesis. In one of my distance education courses, I regularly get final exam scores that could pass for pant sizes. I have a few reasons to suspect that the exam itself is not to blame. First, it consists of multiple-choice questions that tend toward definitions, and general queries about “what,” rather than “why” or “how.” Second, the exam questions come directly from the learning objectives, so there are no surprises. Third, if the students did nothing but study their assignments thoroughly, they would have enough knowledge to score well above the long-term class average. My hypothesis is that students do poorly because the class is easy to put on the back burner. When the exam comes around, they find themselves cramming a term’s worth of learning into a few days.
Part of the reason the class is easy to ignore is that the assignments can be accomplished with a perfunctory browsing of the textbook. In my defense, there isn’t much I can do about fixing the assignments. Someone above my pay grade would have to start the machinery of course designers, contracts, and printing services. In defense of the course author, I’m not entirely sure how to fix the assignments. If a student were so inclined (and some have been), the assignments could be effective learning tools.
Another problem is that students tend to paraphrase the right part of the textbook. Even if I suspect that they don’t understand what they’ve written, I have few clues about what to remedy. The final result is that students earn high grades on their assignments. If they place any weight at all on those numbers, I fear they seriously overestimate their learning, and seriously underestimate the amount of work they need to put into the class.
So, back to testing my hypothesis: I decided to compare students’ averages on assignments with their final exam scores. I reasoned that a systematic relationship would indicate that assignment scores reflected learning, and therefore the exam was just too difficult. (Because all of the questions came undisguised from the learning objectives, I eliminated the possibility that a lack of relationship would mean the exam didn’t actually test on the course material.)
I also went one step further, and compared the results from this course (let’s call it the paraphrasing course) with another where assignments required problem-solving, and would presumably be more effective as learning tools (let’s call that the problem-solving course).
My first impression is that the paraphrasing course results look like a shotgun blast, and the problem-solving course results look more systematic. An unsophisticated application of Excel’s line fitting suggests that 67% of the data for the problem-solving course can be explained if assignment grades reflect knowledge gained, while only 27% of the data from the paraphrasing course can be explained that way.
I’m hesitant to call the hypothesis confirmed yet, because the results don’t really pass the thumb test. In the thumb test you cover various data with your thumb to see if your first impression holds. For example, if you cover the lowest exam score in the paraphrasing course with your thumb, the distribution could look a little more systematic, albeit with a high standard deviation. If you cover the two lowest exam scores in the problem-solving course, the distribution looks a little less so. There is probably a statistically sound version of the thumb test (something that measures how much the fit depends on any particular point or set of points, and gives low scores if the fit is quite sensitive) but googling “thumb test” hasn’t turned it up yet.
From looking at the results, I’ve decided that I would consider a course to be wildly successful if the grades on a reasonably set exam were systematically higher than the grades on reasonably set assignments— it would mean that the students learned something from the errors they made on their assignments, and were able to build on that knowledge.