Assessment

Time: The final frontier

Timefleet Academy logo: a winged hourglass made of ammonites

A logo begging for a t-shirt

Here it is: the final incarnation of my design project for Design and Development of Educational Technology– the Timefleet Academy. It’s a tool to assist undergraduate students of historical geology with remembering events in Earth history, and how those events fit into the Geological Time Scale. Much of their work consists of memorizing a long list of complicated happenings. While memorizing is not exactly at the top of Bloom’s Taxonomy (it’s exactly at the bottom, in fact), it is necessary. One could approach this task by reading the textbook over and over, and hoping something will stick, but I think there’s a better way.

I envision a tool with three key features:

  • A timeline that incorporates the Geological Time Scale, and “zooms” to show events that occur over widely varying timescales
  • The ability to add events from a pre-existing library onto a custom timeline
  • Assessments to help students focus their efforts effectively

Here’s an introduction to the problem, and a sketch of my solution. If your sensors start to detect something familiar about this enterprise then you’re as much of a nerd as I am.

Timefleet Academy is based on the constructionist idea that building is good for learning. Making a representation of something (in this case, Earth history) is a way of distilling its essential features. That means analyzing what those features are, how they are related, and expressing them explicitly. Ultimately this translates to the intuitive notion that it is best to approach a complex topic by breaking it into small digestible pieces.

Geological Time Scale

This is what you get to memorize.

As challenging as the Geological Time Scale is to memorize, it does lend itself to “chunking” because the Time Scale comes already subdivided. Even better, those subdivisions are designed to reflect meaningful stages (and therefore meaningful groupings of events) in Earth history.

There is an official convention regarding the colours in the Geological Time Scale (so no, it wasn’t my choice to put red, fuchsia, and salmon next to each other), and I’ve used it on the interface for two reasons. One is that it’s employed on diagrams and geological maps, so students might as well become familiar with it. The other is that students can take advantage of colour association as a memory tool.

Assessments

Assessments are a key difference between Timefleet Academy and other “zoomable” timelines that already exist. The assessments would come in two forms.

1. Self assessment checklists

These allow users to document their progress through the list of resources attached to individual events. This might seem like a trivial housekeeping matter, but mentally constructing a map of what resources have been used costs cognitive capital. Answering the question “Have I been here already?” has a non-zero cognitive load, and one that doesn’t move the user toward the goal of learning historical geology.

2. Drag-and-drop drills

The second kind of assessment involves drill-type exercises where users drag and drop objects representing events, geological time periods, and dates, to place them in the right order. The algorithm governing how drills are set would take into account the following:

  • The user’s previous errors: It would allow for more practice in those areas.
  • Changes in the user’s skill level: It would adjust by making tasks more or less challenging. For example, the difficulty level could be increased by going from arranging events in chronological order to arranging them chronologically and situating them in the correct spots on the Geological Time Scale. Difficulty could also be increased by placing time limits on the exercise, requiring that the user apply acquired knowledge rather than looking up the information.
  • The context of events: If drills tend to focus on the same group of events, the result could be overly contextualized knowledge. In other words, if the student were repeatedly drilled on the order of events A, B, and C separately from the order of events D, E, and F, and were then asked to put A, B, and E in the right order, there could be a problem.

The feedback from drills would consist of correct answers and errors being indicated at the end of each exercise, and a marker placed on the timeline to indicate where (when) errors have occurred. Students would earn points toward a promotion within Timefleet Academy for completing drills, and for correct answers.

Who wouldn’t want a cool new uniform?

How do you know if it works?

1. Did learning outcomes improve?

This could be tested by comparing the performance of a group of students who used the tool to that of a control group who didn’t. Performance measures could be results from a multiple choice exam. They could also be scores derived from an interview with each student, where he or she is asked questions to gauge not only how well events are recalled, but also whether he or she can explain the larger context of an event, including causal relationships. It would be interesting to compare exam and interview scores for students within each group to see how closely the results of a recall test track the results of a test focused on understanding.

For the group of students who have access to the tool, it would be important to have a measure of how they used it, and how often. For example, did they use it once and lose interest? Did they use it for organizing events but not do drills? Or did they work at it regularly, adding events and testing themselves throughout? Without this information, it would be difficult to know how to interpret differences (or a lack of differences) in performance between the two groups.

 2. Do they want to use it?

This is an important indicator of whether students perceive that the tool is helpful, but also of their experience interacting with it. Students could be surveyed about which parts of the tool were useful and which weren’t, and asked for feedback about what changes would make it better. (The option to print out parts of the timeline, maybe?) They could be asked specific questions about aspects of the interface, such as whether their drill results were displayed effectively, whether the controls were easy to use, etc. It might be useful to ask them if they would use the tool again, either in its current form, or if it were redesigned to take into account their feedback.

Timefleet in the bigger picture

Writing a test

All set to pass the test of time

Timefleet Academy is ostensibly a tool to aid in memorizing the details of Earth history, but it actually does something more than that. It introduces students to a systematic way of learning- by identifying key features within an ocean of details, organizing those features, and then testing their knowledge.

The point system rewards students for testing their knowledge regardless of whether they get all of the answers right. The message is twofold: testing one’s knowledge is valuable because it provides information about what to do next; and testing one’s knowledge counts as progress toward a goal even if you don’t get the right answers every time. Maybe it’s threefold: if you do enough tests, eventually you get a cape, and a shirt with stars on it.

Categories: Assessment, Learning strategies, Learning technologies | Tags: , , , , | 3 Comments

Building assessments into a timeline tool for historical geology

In my last post I wrote about the challenges faced by undergraduate students in introductory historical geology. They are required to know an overwhelming breadth and depth of information about the history of the Earth, from 4.5 billion years ago to present. They must learn not only what events occurred, but also the name of the interval of the Geological Time Scale in which they occurred. This is a very difficult task! The Geological Time Scale itself is a challenge to memorize, and the events that fit on it often involve processes, locations, and organisms that students have never heard of. If you want to see a case of cognitive overload, just talk to a historical geology student.

My proposed solution was a scalable timeline. A regular old timeline is helpful for organizing events in chronological order, and it could be modified to include the divisions of the Geological Time Scale. However, a regular old timeline is simply not up to the task of displaying the relevant timescales of geological events, which vary over at least six orders of magnitude. It is also not up to the job of displaying the sheer number of events that students must know about. A scalable timeline would solve those problems by allowing students to zoom in and out to view different timescales, and by changing which events are shown depending on the scale. It would work just like Google Maps, where the type and amount of geographic information that is displayed depends on the map scale.

Doesn’t that exist already?

My first round of Google searches didn’t turn anything up, but more recently round two hit paydirt… sort of. Timeglider is a tool for making “zoomable” timelines, and allows the user to imbed media. It also has the catch phrase “It’s like Google Maps but for time,” which made me wonder if my last post was re-inventing the wheel.

ChronoZoom was designed with Big History in mind, which is consistent with the range of timescales that I would need. I experimented with this tool a little, and discovered that users can build timelines by adding exhibits, which appear as nodes on the timeline. Users can zoom in on an exhibit and access images, videos, etc.

If I had to choose, I’d use ChronoZoom because it’s free, and because students could create their own timelines and incorporate timelines or exhibits that I’ve made. Both Timeglider and ChronoZoom would help students organize information, and ChronoZoom already has a Geological Time Scale, but there are still features missing. One of those features is adaptive formative assessments that are responsive to students’ choices about what is important to learn.

Learning goals

There is a larger narrative in geological history, involving intricate feedbacks and cause-and-effect relationships, but very little of that richness is apparent until students have done a lot of memorization. My timeline tool would assist students in the following learning goals:

  • Memorize the Geological Time Scale and the dates of key event boundaries.
  • Memorize key events in Earth history.
  • Place individual geological events in the larger context of Earth history.

These learning goals fit right at the bottom of Bloom’s Taxonomy, but that doesn’t mean they aren’t important to accomplish. Students can’t move on to understanding why things happened without first having a good feeling for the events that took place. It’s like taking a photo with the lens cap on- you just don’t get the picture.

And why assessments?

This tool is intended to help students organize and visualize the information they must remember, but they still have to practice remembering it in order for it to stick. Formative assessments would give students that practice, and students could use the feedback from those assessments to gauge their knowledge and direct their study to the greatest advantage.

How it would work

The assessments would address events on a timeline that the students construct for themselves (My Timeline) by selecting from many hundreds of events on a Master Timeline. The figure below is a mock-up of what My Timeline would look like when the scale is limited to a relatively narrow 140 million year window. When students select events, related resources (videos, images, etc.) would also become accessible through My Timeline.

Timeline interface

A mock-up of My Timeline. A and B are pop-up windows designed to show students which resources they have used. C is access to practice exercises, and D is how the tool would show students where they need more work.

Students would benefit from two kinds of assessments:

Completion checklists and charts

The problem with having abundant resources is keeping track of which ones you’ve already looked at. Checklists and charts would show students which resources they have used. A mouse-over of a particular event would pop up a small window (A in the image above) with the date (or range of dates) of the event and a pie chart with sections representing the number of resources that are available for that event. A mouse-over on the pie chart would pop up a hyperlinked list of those resources (B). Students would choose whether to check off a particular resource once they are satisfied that they have what they need from it, or perhaps flag it if they find it especially helpful. If a resource is relevant for more than one event, and shows up on multiple checklists, then checks and flags would appear for all instances.

Drag-and-drop exercises

Some of my students construct elaborate sets of flashcards so they can arrange events or geological time intervals spatially. Why not save them the trouble of making flashcards?

Students could opt to practice remembering by visiting the Timefleet Academy (C). They would do exercises such as:

  • Dragging coloured blocks labeled with Geological Time Scale divisions to put them in the right order
  • Dragging events to either put them in the correct chronological order (lower difficulty) or to position them in the correct location on the timeline (higher difficulty)
  • Dragging dates from a bank of options onto the Geological Time Scale or onto specific events (very difficult)

Upon completion of each of the drag-and-drop exercise, students would see which parts of their responses were correct. Problem areas (for example, a geological time period in the wrong order) would be marked on My Timeline with a white outline (D) so students could review those events in the appropriate context. White outlines could be cleared directly by the student, or else by successfully completing Timefleet Academy exercises with those components.

Drag-and-drop exercises would include some randomly selected content, as well as items that the student has had difficulty with in the past. The difficulty of the exercises could be scaled to respond to increasing skill, either by varying the type of drag-and-drop task, or by placing time limits on the exercise. Because a student could become very familiar with one stretch of geologic time without knowing others very well, the tool would have to detect a change in skill level and respond accordingly.

A bit of motivation

Students would earn points for doing Timefleet Academy exercises. To reward persistence, they would earn points for completing the exercises, in addition to points for correct responses. Points would accumulate toward a progression through Timefleet Academy ranks, beginning with Time Cadet, and culminating in Time Overlord (and who wouldn’t want to be a Time Overlord?). Progressive ranks could be illustrated with an avatar that changes appearance, or a badging system. As much as I’d like to show you some avatars and badges, I am flat out of creativity, so I will leave it to your imagination for now.

Categories: Assessment, Learning strategies, Learning technologies | Tags: , , , , | Leave a comment

When good grades are bad information

Assignment grades versus exam gradesThis week I set out to test a hypothesis. In one of my distance education courses, I regularly get final exam scores that could pass for pant sizes. I have a few reasons to suspect that the exam itself is not to blame. First, it consists of multiple-choice questions that tend toward definitions, and general queries about “what,” rather than “why” or “how.” Second, the exam questions come directly from the learning objectives, so there are no surprises. Third, if the students did nothing but study their assignments thoroughly, they would have enough knowledge to score well above the long-term class average. My hypothesis is that students do poorly because the class is easy to put on the back burner. When the exam comes around, they find themselves cramming a term’s worth of learning into a few days.

Part of the reason the class is easy to ignore is that the assignments can be accomplished with a perfunctory browsing of the textbook. In my defense, there isn’t much I can do about fixing the assignments.  Someone above my pay grade would have to start the machinery of course designers, contracts, and printing services. In defense of the course author, I’m not entirely sure how to fix the assignments. If a student were so inclined (and some have been), the assignments could be effective learning tools.

Another problem is that students tend to paraphrase the right part of the textbook.  Even if I suspect that they don’t understand what they’ve written, I have few clues about what to remedy.  The final result is that students earn high grades on their assignments. If they place any weight at all on those numbers, I fear they seriously overestimate their learning, and seriously underestimate the amount of work they need to put into the class.

So, back to testing my hypothesis: I decided to compare students’ averages on assignments with their final exam scores. I reasoned that a systematic relationship would indicate that assignment scores reflected learning, and therefore the exam was just too difficult. (Because all of the questions came undisguised from the learning objectives, I eliminated the possibility that a lack of relationship would mean the exam didn’t actually test on the course material.)

I also went one step further, and compared the results from this course (let’s call it the paraphrasing course) with another where assignments required problem-solving, and would presumably be more effective as learning tools (let’s call that the problem-solving course).

My first impression is that the paraphrasing course results look like a shotgun blast, and the problem-solving course results look more systematic. An unsophisticated application of Excel’s line fitting suggests that 67% of the data for the problem-solving course can be explained if assignment grades reflect knowledge gained, while only 27% of the data from the paraphrasing course can be explained that way.

I’m hesitant to call the hypothesis confirmed yet, because the results don’t really pass the thumb test. In the thumb test you cover various data with your thumb to see if your first impression holds. For example, if you cover the lowest exam score in the paraphrasing course with your thumb, the distribution could look a little more systematic, albeit with a high standard deviation. If you cover the two lowest exam scores in the problem-solving course, the distribution looks a little less so. There is probably a statistically sound version of the thumb test (something that measures how much the fit depends on any particular point or set of points, and gives low scores if the fit is quite sensitive) but googling “thumb test” hasn’t turned it up yet.

From looking at the results, I’ve decided that I would consider a course to be wildly successful if the grades on a reasonably set exam were systematically higher than the grades on reasonably set assignments— it would mean that the students learned something from the errors they made on their assignments, and were able to build on that knowledge.

 

Categories: Assessment, Distance education and e-learning | Tags: , , , , , | 2 Comments

Blog at WordPress.com.