BERK'S BLOG: From the keyboard of the "Humor Professor": low validity

Sunday, November 8, 2009

Why Should GLOBAL ITEM SCORES Not Be Used for Summative Decisions? PART II

Please consider the following 2 reasons why the global items may not be the best option for those summative decisions:

1. ACCURACY AND FAIRNESS: After your students have spent 45 hours in your course over the semester, does their rating of 1 item seem to accurately capture the sum total of all of the experiences in your classroom? There is no doubt that the item furnishes information about your performance and the course, but should it be used for summative, super-important decisions about your career? Is 1 item score of 0-4 a fair and reasonable base to infer overall performance? Could you accurately and fairly rate the performance of your administrative assistant, department chair, or dean with 1 item to truly evaluate his or her degree of effectiveness?

2. VALIDITY AND RELIABILITY: Think of student rating scales like the S.A.T. One item from the S.A.T. doesn't provide a valid and reliable score of verbal ability any more than 1 or 2 global items on a rating scale provide a valid and reliable measure of teaching performance. Although the former measures knowledge and the latter measures attitudes or opinions, the psychometric problem is virtually identical. Such data would be tantamount to giving high schoolers 1 or 2 verbal and quantitative items from the S.A.T. and making individual college admission decisions on those scores. Those scores are technically unreliable. One or 2 items are extremely unreliable compared to a summary score derived from 5 subscales or 35 items. Reliability coefficients should be in the .80s-.90s for individual decisions, although coefficients as low as the .60s are acceptable for group-based decisions in research. Typically, a single item does not yield a coefficient in the acceptable range.

The third reason, PROFESSIONAL AND LEGAL STANDARDS, will be covered in the next blog. It is the most important of the 3 reasons. Don’t miss it.

COPYRIGHT © 2009 Ronald A. Berk, LLC

Sunday, November 1, 2009

What’s Wrong with Low Response Rates from Online Student Evaluations?

There have been cries, yelps, screams, shrieks, screeches, shrills, howls, and other sounds reverberating throughout the academic hallways and byways about low response rates from online administrations of student rating scales. Traditionally, the rates for in-class paper-based administrations following appropriate standardized procedures have been 80%+.

SAMPLING BIAS
First, let’s be Jack Nicholson A Few Good Men “crystal” clear about the problem with low response rates. When ratings dip below 80%, you’re in deeeep trouble! WAIT! That’s not the ending to that sentence. Where did it go? Oh, here it is: sampling bias increases, so that the summarized ratings will present an unrepresentative, biased picture of teaching performance in a given course. Such ratings could be inflated or deflated due to bias. This is evil, especially when student ratings constitute the only source of evidence on teaching performance, which is the case at many institutions. The results may be useless for either formative or summative decisions.

VALIDITY PROBLEM
The intractable problem is that there is no way to detect the direction or degree of bias. This low degree of rating score validity makes it extremely difficult to interpret the ratings based on the limited sample of students who chose (or self-selected themselves) to complete the scales. Keep in mind, these inaccurate ratings may still yield a high degree of internal consistency reliability, which can be misleading.

REASONS FOR NONRESPONSE
The response rate for online administrations can be half the rate or lower of paper-based administrations. This is a frequent objection to online ratings reported in faculty surveys. This fear of low response rate has deterred some institutions from adopting an online system. The research on this topic indicates the following possible reasons for the nonresponses: student apathy, perceived lack of anonymity, inconvenience, inaccessibility, technical problems, time for completion, and perceived lack of importance (Ballantyne, 2000; Dommeyer, Baum & Hanna, 2002; Sorenson & Reiner, 2003).

That’s the problem. Now how do we fix it? Several institutions have tested a variety of strategies to increase response rate that address several of the aforementioned reasons. These strategies have been suggested by faculty AND students. My next blog will present the Top 10 most effective strategies. Stick around as the plot thickens.

COPYRIGHT © 2009 Ronald A. Berk, LLC