Please consider the following 2 reasons why the global items may not be the best option for those summative decisions:
1. ACCURACY AND FAIRNESS: After your students have spent 45 hours in your course over the semester, does their rating of 1 item seem to accurately capture the sum total of all of the experiences in your classroom? There is no doubt that the item furnishes information about your performance and the course, but should it be used for summative, super-important decisions about your career? Is 1 item score of 0-4 a fair and reasonable base to infer overall performance? Could you accurately and fairly rate the performance of your administrative assistant, department chair, or dean with 1 item to truly evaluate his or her degree of effectiveness?
2. VALIDITY AND RELIABILITY: Think of student rating scales like the S.A.T. One item from the S.A.T. doesn't provide a valid and reliable score of verbal ability any more than 1 or 2 global items on a rating scale provide a valid and reliable measure of teaching performance. Although the former measures knowledge and the latter measures attitudes or opinions, the psychometric problem is virtually identical. Such data would be tantamount to giving high schoolers 1 or 2 verbal and quantitative items from the S.A.T. and making individual college admission decisions on those scores. Those scores are technically unreliable. One or 2 items are extremely unreliable compared to a summary score derived from 5 subscales or 35 items. Reliability coefficients should be in the .80s-.90s for individual decisions, although coefficients as low as the .60s are acceptable for group-based decisions in research. Typically, a single item does not yield a coefficient in the acceptable range.
The third reason, PROFESSIONAL AND LEGAL STANDARDS, will be covered in the next blog. It is the most important of the 3 reasons. Don’t miss it.
COPYRIGHT © 2009 Ronald A. Berk, LLC
No comments:
Post a Comment