Thursday, May 21, 2009

Constructing Evaluation Instruments

The posting below gives some excellent advice on constructing evaluation instruments and their uses in testing and grading. The author is Stanford C. Ericksen.

Testing and Grading

Fair play is the first and final requirement in matters of testing and grading. Students will accept pressures for hard work but object strenuously and rightly so, to signs of unfairness in a teacher's assessment of their efforts. Being an expert in an area of subject matter and having the speaking skills required for teaching are quite different dimensions of professional competence than are the abilities to construct discriminating examinations and to assign valid grades. Improvement on the part of instructors in the areas of testing and grading is nearly always in order.

An important distinction must be made between evaluation and grading. Evaluation is information provided to the student about particular aspects of what was said or done during the effort to learn, to solve a problem, or to organize and integrate facts and concepts. As they move into unknown intellectual territory, students must have guideposts to confirm that they are moving in the right direction. The qualitative comments about particular aspects of a term paper are far more constructive aids for the specifics of learning and remembering than is the grade on the cover page. Evaluation, therefore, is indispensable to students for gaining understanding and to fix what is learned in memory. A grade, on the other hand, is a gross index which typically comes too late for the student to take corrective measures about the specifics of learning.

A few guidelines about testing and grading can help instructors to: (1) strengthen the process of instruction, (2) clarify the diagnostic value of testing, (3) make a fair assessment of what each student knows, and (4) report this achievement through grades.

Testing as a Tool for Instruction

Students tend to concentrate their study effort in preparation for an exam, and they structure this effort in anticipation of the nature of the questions they will be asked. If students anticipate the need to know unassimilated facts, they will concentrate on memorizing information; if they expect to be asked to integrate, extend, and evaluate information, they will try to prepare themselves along those lines. The management of testing is an opportunity for the instructor to underline the essential elements making up the course.

As a matter of fact, a program for the orientation and training of beginning college teachers could well be geared to the interdependence among: the objectives of a course, the sequence of topics (and their classroom presentation), and the manner in which this can be assessed by means of tests, papers, and special projects. I recall a science professor whose overriding goal was "to teach students to think like a scientist thinks" but whose tests were almost solely measures of how well students memorized. He changed his exams to emphasize integration of material, and everyone felt better about the course.

The Diagnostic Use of Tests

Placement testing is commonly used at the department and college level, but within our own courses we can also make effective use of similar testing for making a grade-free diagnostic appraisal of what information is already known by the students or is not known but should be. Diagnostic testing is an excellent instructional tool because when a student says, in effect, "I don't see why the question was scored that way," an inquiry is started toward unscrambling the false connections. In this close-up look, the teacher may note a pattern of mistakes showing a misunderstanding of a particular rule, procedure, or principle. It may also appear that a student has the right answer but for the wrong reasons.

A diagnostic test is a sort of intellectual X-ray showing the strengths and weaknesses in a student's inventory of information, understanding, and skill. The evaluative emphasis is on the responses to individual test items, on information prerequisite for understanding the larger concepts and procedures in this particular course of study.

When students realize the significance to themselves of grade-free probing, they are more likely to open up and reveal low points in their preparation profile, anxieties, misconceptions and deficiencies in knowing how to do certain tasks. A sprinkling of short, diagnostic quizzes early in the term suggests to students that the teacher cares about how they are doing and is taking corrective steps to help them along - an excellent climate for starting the semester.

Assessing Achievement

Although test scores in any setting are affected by students' aptitude, study skills, motivation, background preparation, and the influence of the teacher, our classroom examinations should be designed primarily to measure subject-matter achievement. To this end, the teacher and student seek the same wavelength within an assigned domain of knowledge. A frustrated student expressed a contrary state of affairs quite clearly, "I don't like to play the professor's game: I've got a secret, see if you can guess what it is."

Effective classroom instruction is central to student learning, but students are short-changed if the examinations are trivial, irrelevant, confusing and tangential to the substance of the course. College teaching is not complete without an accurate and fair assessment of students' achievement during the term and at its conclusion.

Objective Tests

Objective (machine-scorable) tests are almost mandatory in large classes, but constructing such instruments is a demanding task. Although it is tempting for teachers to make use of commercially available examinations, to pull old tests from the file, or to overuse test items taken from a teacher's manual, students are best served when their instructors develop exams tailored to their specific course and based on sound principles.

Two basic concepts need to guide the development of classroom examinations:

1. Validity refers to whether an instrument measures what it is supposed to measure. A valid test, therefore, samples what students should have learned from your course offering. It measures here and-now achievement rather than, for example, how well a student reads or how much information the student had gained outside the course. Test items about minutiae and footnote information are temptingly easy to put together but lack the validity of questions that elicit a student's understanding of key concepts, important factual data, and relevant procedures. A valid test is an unambiguous reflection of what is worth knowing and remembering.

2. Reliability refers to the consistency of an instrument's results. A good short quiz is better than a poorly constructed long test but, assuming equal quality of items, a 50-item test is more reliable (stable, consistent) than a 10-item quiz. The random errors due to ambiguous wording, idiosyncratic interpretations, distractions, and other flaws are more likely to cancel out in the longer test, resulting in a more dependable total score. Thus, the easiest way to reduce the unreliability in the measuring instrument is simply to increase the length of the test.

Objective tests come in many forms, but the multiple-choice format carries most of the burden. When carefully worded, multiple-choice items can probe a student's understanding of factual information, skills and procedures, concrete and abstract concepts, and the implications from different scales of values. (True-false items are altogether too constrained to be effective discriminators for most college courses.)

To strengthen the quality of the set of items used, a complete item analysis should be made of each new test. This test-of-the-test is mainly to determine and adjust the difficulty level of each item. It is normal to find that many of our carefully conceived questions turn out to be too easy or too difficult or just seem to ride along as excess baggage. Such items use valuable testing time but add little to the discriminating power of the test. They don't help to separate the top group of students from the bottom group of achievers.

Because ambiguity of meaning is a persistent problem, the wording of test items is critical. Careful editing of the draft exam includes close attention to such pitfalls as cluing the right answer, overlapping correct alternatives, or asking for a positive answer to a negative question. Good test items are parsimonious in meaning and simple in wording. It is surprising how quickly excess words can lead to double meaning or obscure the correct answer. It is appropriate, however, to expand the stem - the lead-in statement of the multiple-choice question - by using a relevant quotation or making reference to a particular body of factual data.

Score the test in a straightforward manner, e.g., in terms of the number of right answers. Trying to adjust (punish) for guessing may, in effect, simply open further sources of variability. Combining raw scores from different performance measures, i.e., tests, term papers, class participation, special projects, etc., can easily distort your original intention. The statistical solution is to convert the different measures to a common scale through the use of some type of standard-score scale.

Subjective Evaluation

The distinctive value of essay exams or term papers is the freedom they offer for students to probe and develop the personal meaning of ideas and to express these thoughts in their own words. To organize an integrated chain of thought, to elaborate on findings, and to communicate ideas to others are stronger tests of achievement than is the recognition or recall of isolated units of information.

1. Essay Exams. In an essay examination, the student is staring at a blank page and generating, from within, a complicated sequence of responses aimed at organizing a meaningful unit of knowledge. This ability to recall is a more demanding test of memory than simply to recognize something. As essay examination elicits the ability to retrieve information but with little help from presently given cues. The perceptive teacher (reader) can evaluate the strong and weak points in a written argument even when the student's perception of a question differs from the teacher's. Evaluative permissiveness can, of course, go only so far.

A steady and unwavering evaluative state of mind is difficult to sustain when reading page after page through a set of exams. Three procedural controls help to reduce the evaluating drift: (1) turn under the front (name) page to forestall confounding effects from those students we particularly like or dislike; (2) read one question at a time through the entire set of exam booklets; (3) shuffle the order of the booklets periodically to balance the inevitable effects of reader fatigue or an emerging tilt toward one pattern of answers.

2. Term Papers. In some respects, the term paper is the essence of what a student has gained from the course. It sets forth what the individual student has learned and how the student has pulled together all the information for comprehension and understanding. This, in turn, serves to keep the knowledge available in long-term memory.

A written handout is a useful guide regarding the due date, length, use of references, comments about style, and any other restrictions or suggestions about the assignment. It may, for example, be helpful to remind students about the difference between describing versus analyzing events and ideas. The heavy task of reading these papers is counterbalanced, somewhat, by the satisfaction of reading the better papers - some of which can be truly exciting.

Grading a stack of exams and papers is a time consuming and pressured task because, throughout, the matter of fair play is squarely on the back of the reader. By way of evaluation, the teacher should indicate in some detail the rationale for assigning the gross grade, making specific reference to identified parts of the exam or paper. The instructional value of essay exams and term papers is practically wiped out if the student receives nothing back other than the grade.

Grading

Faculty standards for A-grade performance define the meaning of excellence within the university. We must guard the criteria of achievement, since everyone pays the price of academic inflation when these standards are lowered. Students work hard for grades because "making the grade" is personally rewarding and is an important basis for special awards, admission to advanced training, and employment prospects ' With such payoff potential it is unfair for a teacher to be casual or careless in assigning this index of achievement. Judgments about professional competence must take into account the quality of a teacher's procedures for testing and grading.

There are two basic options available to instructors for grading student achievement:

1. Norm-referenced grading, more commonly referred to as grading-on-the-curve, sets the scale of achievement by the average level of class performance. Students basically compete against one another in this approach.

2. Criterion -referenced grading has the teacher measuring the students against some absolute standard with respect to what they are expected to learn. The competition here is between the student and mastery of a finite body of knowledge.

In practice, these two approaches overlap and merge since a teacher's judgment about levels of achievement is influenced by the levels of student performance with which one is accustomed at a given school. Also, the departmental culture enters into the picture, because a teacher's procedures and standards for testing and grading are expected to fall in line with the traditions or policies of the home department.

The danger in grading-on-the-curve is its diminishment of the teacher's responsibility for evaluating the students' level of understanding against his or her preset criteria of subject-matter achievement. The final examination, for example, is a revealing statement sampling the information and skills the teacher believes should be carried from the course.

With criterion-referenced grading, there is the danger that the instructor may set the expected level of achievement unrealistically high or low, with the result that students perceive the exam as inappropriate and unfair.

Grades serve the academic purpose of showing intellectual achievement in a limited domain defined by books, teachers, laboratories, and the like. They are not designed to predict success in the off-campus setting where special weight may be given to information, aptitudes, and personal characteristics extending beyond the boundaries of teachers and their courses. Only indirectly or on occasion, do grades reflect a student's tolerance for stress, independent decision-making, congeniality in human relations, ability to cope with unexpected problems, and the like. Teachers can best sustain the credibility of the grading system by making their assessments reflect as fairly as possible how well each student has achieved the stated objectives of the course.

Monday, May 04, 2009

Guide to Your First Semester of College Teaching

from Chapter 12, Common Problems, in the book: On Course - A Week-by-Week Guide to Your First Semester of College Teaching, by James M. Lang

Q: How do I handle rude student behavior in my classroom-talking, laughing, getting up and down during class?

"No experience of new faculty as teachers," writes Robert Boice, "is so dramatic and traumatizing as facing unruly, uninvolved students-especially in the large, introductory courses traditionally assigned to newcomers" (81). Undoubtedly true; equally troublesome, with the omnipresence of laptops and wireless-enabled classrooms, are students spending class time shopping for shoes online, rather than taking notes (see following question).

Two major points here. First, rude student behavior often comes about because of what's happening at the front of the classroom. If students are talking and reading the student newspaper during the lecture, sending e-mails, or IMing their friends, your lectures may be boring. If students are chit-chatting with each other during the discussion, you may not be asking interesting questions. A well-taught class is the best preventive measure you can take to counter what Boice calls as "student incivilities." His research on this issue suggests that newcomers face student incivilities at much higher rates than highly rated teachers with years of classroom experience (81-98). Fortunately for you, you are doing the work right now to become a highly rated teacher, and following the prescriptions of this book-and other preparatory work you do for your first semester-will be the best measure you can take against poor behavior.

However students, like the rest of the population, can be just rude idiots, so sometimes your best teaching efforts won't be enough to eliminate such behaviors. You won't know always know about students surfing the internet in class, but you will certainly know about noisy and rude students. When that happens, you can either shame such students by calling them on the behavior in front of their peers, or you can find ways to discuss their behavior with them in private. My non-confrontational personality, coupled with a dozen years of teaching and raising children, have convinced me that the latter route is the better one for correcting poor behavior. When identifiable students are acting uncivilly in your classroom (however you may define such activity), you can stop them after class and give them the standard lines you would expect to give-that such behavior makes it difficult for you and other students to concentrate, and so on. You can also ask them to come see you in your office, and discuss it there, if you think they require a more serious dressing down. A third method that I have used is to append a P.S. to one of my final comments on their papers, addressing the behavior and asking them to improve it or to come see me in my office. Calling them on the behavior privately like this has always worked for me. If you try this and it doesn't have the desired effect, check with your chair; seriously persistent and disruptive behavior should be observed by a senior faculty or administrator so that you won't suffer for it in your teaching evaluations (and they may be able to intervene with the students).

Q: Students have their laptops, cell phones, PDAs, and what-have-you on in my classroom, and whenever I step out into the seats I can see that half of them are shopping for shoes or downloading music or text messaging their friends. Some students have cell phones going off in class. What can I do about this?

A: This is probably the most annoying problem we will all face in the future, so best to consider it now and decide how you want to handle it. The solutions seem to me different for different sized-classes. In small classes, twenty or thirty or less, you need to have a strong physical presence in the classroom. You should be using interactive teaching methods in classes that size, of course, and such methods give you an opportunity to move out into the seats, work your way around the classroom, and let students know that at any given moment you will be standing behind them, seeing whatever they have on their desk or laptop. Do not isolate yourself in the front of the classroom; you command the entire space of the classroom, and you need to make yourself felt at every desk. You don't need to be in constant motion, of course; student awareness of your mobility will go a long way towards keeping them on task.

In larger lecture classrooms and auditoria, you can still do some of this, but the problems will be worse here. So you have two choices, and neither of them are ideal: learn to live with a certain amount of technological distraction, or ban the technologies that are disrupting your classroom. If you choose option one, it doesn't mean you should do nothing. At the very least, you should discuss the inappropriate use of technology in the classroom at the beginning of the semester, and perhaps even include on the syllabus a technology warning like the one cited by Michael Bujega in a Chronicle essay on this subject:

If your cellular phone is heard by the class you are responsible for completing one of two options: 1. Before the end of the class period you will sing a verse and chorus of any song of your choice or, 2. You will lead the next class period through a 10-minute discussion on a topic to be determined by the end of the class. (To the extent that there are multiple individuals in violation, duets will be accepted).

Whether you use humor in such a warning or not, including an admonition on the syllabus gives you an excuse to discuss the use of technology with students in the classroom, and to help the conscientious (but perhaps clueless) students see how to comport themselves more appropriately.

However, if you are teaching in a large wireless classroom, facing a sea of laptops, and you are convinced that the vast majority of the students are not listening to your scintillating words, then don't hesitate to ban laptops, either outright or for specific parts of the session. No student has a constitutional right to bring a laptop to class, so you have every right to forbid them (you might announce that you will make special provisions for students with disabilities, however). Don't feel bad about it; students have been taking notes with pencil and paper for many hundreds of years; it won't kill them. A less stringent option would be to allow or encourage laptops for specific activities in class-asking students to join you in reviewing a website or program you have scouted for them in advance, or working with them on a program or problem-but then asking them to close the laptops for the fifteen-minute lecture module you have planned for the end of the class, when you will be summarizing the main idea of the day.

Remember-you are in charge. As Bujega concludes at the end of his essay on inappropriate technology in the classroom, "despite digital distractions, large classes, decreased budgets, and fewer tenured colleagues, professors still are responsible for turning students on to learning. To do so, we just may have to turn off the technology."

Q: Students are not coming to class, or they come late. Do I leave those choices up to them, since they are adults, or do I become an enforcer and start each class with a daily quiz?

An article on poor attendance in college and university courses, which appeared in the spring of 2007 on insidehighered.com and provoked a massive outpouring of responses, offered a bleak picture of this issue. The article included the following statistics on attendance and tardiness patterns:
A 2005 survey of first-year undergraduate students by the Higher Education Research Institute at the University of California at Los Angeles showed that while a majority of college students spend 11 or more hours in class per week, 33 percent reported skipping class and 63 percent said they come to class late "occasionally" or "frequently."A similar survey showed that the proportion of students who report coming late to class has jumped from 48 percent in 1966 to 61 percent in 2006 - evidence, one could argue, of a growing indifference to class in general.

I'm going to start sounding like a bit of a skipping CD here, but the first principle is to ensure that you are creating a classroom experience which students could not duplicate by copying someone else's lecture notes, or by listening to a recording of your lecture. Students, in other words, should play a role in the classroom. If you are giving students a role to play-through discussions, group work, in-class writing, problem-solving, and so on-then you have every right to say that the success of the course depends upon the presence of the students, and to require that presence. If you are standing in front of a podium and lecturing for fifty minutes, then I'm with the tardy and missing students on this one-why should they come to class, when they can get the same material more efficiently, and in the comfort of their dorm rooms, from other means?

As long as you are offering a class worth attending, which depends upon students for its success, then you should not hesitate to drop the hammer on late and absent students. Take whatever measures seem appropriate to you, from locking the door at the start of class to giving daily quizzes at the opening of class, from calling tardy students to the carpet as they walk in the door to penalizing students who miss more than three classes on their final grade. Consult the article on insidehighered.com for more ideas on combating this problem, and especially the responses that follow.

Q: I have trouble remembering the names of my own children; the prospect of remembering the names of several sets of twenty or thirty or forty undergraduates each year just seems impossible. Can I call on them as "red baseball cap" or "kid who plagiarized" or "crewcut" just to keep things simple for me?

A: I did know a teacher who managed this successfully, actually. At the beginning of the semester he hit upon some aspect of a student's appearance or mannerisms, gave them a nickname linked to it, and then referred to him or her in that manner in class. He pulled it off because he was eccentric and had a great sense of humor, and did not use offensive or embarassing nicknames (i.e., no one was nicknamed "baldie" or anything). The potential ways in which this practice could go bad are so numerous, though, that I really wouldn't recommend it.

Mary McKinney, a clinical psychologist who counsels academics on career issues, addressed this problem in an essay for the online academic news site insidehighered.com, and described there more than a dozen techniques for learning the names of students-her list is worth consulting, and is available online for free (see below for reference). The one that I like best, number twelve, may be the simplest. Every time a student asks a question or speaks in class, ask them for their name; repeat the name somehow in the answer-"Jane asks an important question here . . ."-and if that question or your response to it comes up in class again, associate it once again with her name: "You'll remember that Jane asked us this question last week . . ." The more times you repeat the name, the more likely you will be to remember it. This technique has the bonus benefit of affirming the importance of student contributions in your classroom, making visible to them how their ideas are woven into the fabric of the lectures and discussions. Classes of fifty or more obviously do not require you to learn everyone's name, but don't abandon names altogether. Learn any names you can, but don't fret about not having comprehensive coverage.

- University of Oregon, Teaching Effectiveness Program