Next: Tests as proctored drill Up: Exams Previous: The instructional functions of Contents

Selection of problems

Ideally, a test would consist of highly valid, highly discriminating questions having varying degrees of difficulty. Some questions would discriminate between As and Bs, others between Bs and Cs, and others between Cs and Ds, etc. A test consisting only of difficult questions will be of little use in separating the Cs from the Ds or the Ds from the Fs.

We have no direct measure of true competence, so instead we assume that some set of problems have some validity on average. We can then measure how the results on a given problem correlate with the overall average. We can then reweight the problems and repeat the process until it stabilizes. (Unfortunately, naive reweighting heuristics tend to stabilize with all of the weight on a single problem.)

Note that difficulty, validity, and discrimination are not absolute attributes. Rather they are relative to a give set of actual scores.

Many questions that appear on exams are really a game of ``guess what I want for an answer''.^4.3 Such questions should have little or no validity, but often the better students are better at inferring what an instructor has in mind. Also, cultural factors play significant a role in such inferences. For instance, if I ask, ``What is the worst possible page-replacement policy?'', Asian student will usually give me the worst of the policies discussed in class. Students from the US and Europe will usually say ``Soonest to next use'', because they've learned that longest-to-next-use is optimal (under certain circumstances).

See http://kcc.cc.sd.us/WebPage/hb/stuperf.html for general tips on asessment. For insightful and advice regarding the construction of exam questions for mathematics and science, see [R22] and [R23]. For further information on multiple choice questions, see http://www.aicpa.org/members/div/mcs/mulchoic.htm.

For on-line testing (a.k.a. computer-aided asessment), there is a tendency to emphasize multiple-choice questions, which facilitate automatic scoring. Like many others, the authors of [R23] contend that multiple-choice questions can test rote memorization of factual material, but obviously one can test arithmetic skills with multiple-choice questions. The authors of [] have, however, constructed a very interesting tool (TestTool) that can automatically scores answers that involve certain restricted diagrams, e.g., the outcome of inserting a node in an AVL tree.

See http://www.caacentre.ac.uk/resources/ for resources on computer-aided assessment (CAA).

Next: Tests as proctored drill Up: Exams Previous: The instructional functions of Contents

Tom Payne 2003-09-04