BERK'S BLOG: From the keyboard of the "Humor Professor"

My blogs reflect my research interests and reflections on issues in teaching, PowerPoint, social media, faculty evaluation, student assessment, time management, and humor in teaching/training and in the workplace. Occasional top 10 lists may also appear on timely topics. They are intended for your professional use and entertainment. If they are seen by family members or pets, I am not responsible for the consequences. If they're not meaningful to you, let me know. ENJOY!

Showing posts with label global items. Show all posts

Tuesday, June 1, 2010

A BerksNotes® GUIDE TO INTERPRETING STUDENT RATING RESULTS: Total Scale Score vs. Global Item Scores

HOW DOES TOTAL SCORE COMPARE TO GLOBAL ITEM SCORES?
On many scales, global items are included at the end. These items ask students to provide a summary rating of the instructor and/or course. They take on a variety of formats, but the purpose is the same. For example, using anchors ranging from Excellent to Poor, the following items might be given:

What was the overall quality of your instructor’s teaching?
What was the overall value of this course?

OR

using Strongly Agree to Strongly Disagree,

This is the worst instructor on the planet.
This course sucks.

LIMITATIONS: The item response to each of these items would be interpreted the same as any other item on the scale. The problem is that these items are not diagnostic for teaching improvement. They provide an overall rating.

So what’s the problem? Individual item responses, either percentage responses to anchors or item means/medians, are usually unreliable. When those responses are used to suggest areas for improvement, they serve as a guide. No major career-shattering decisions are being made. If global item responses are used for summative decisions by your department chair or the promotion committee, there is a lot more at stake.

RECOMMENDATION: Subscale or total scale scores that summarize the quality of teaching or the value of the course are usually more reliable, based on a collection of items measuring those characteristics, than just a single item. It is recommended that those scores be used in lieu of global item scores whenever possible for any summative decisions about teaching performance.

My final blog in this bloated series will briefly describe criterion- and norm-referenced interpretations of the scores previously defined.

COPYRIGHT © 2010 Ronald A. Berk, LLC

Saturday, May 29, 2010

A BerksNotes® GUIDE TO INTERPRETING STUDENT RATING RESULTS: Total Scale Level

WHAT ARE TOTAL SCALE SCORES?
The highest level of score summary is the total scale score across all items. It’s like the total score on a test, except there are no right and wrong answers on a scale. If the scale consists of 36 items, each scored 0–3, the following results might be reported:

Total scale score range = 0–108; Midpoint = 54
        Mean/Median = 96.43/101, where N = 97

The continuum for interpretation would be the following:

Extremely                                                Mean     Extremely
Unfavorable                  Neutral               96.43     Favorable
      0__________________54___________________108
                                                                     Mdn
                                                                      101

INTERPRETATION OF TOTAL SCORES: This score gives a global, or composite, rating that is only as high as the ratings in each of its component parts (anchors, items, and subscales). It represents one overall index of teaching performance, from Extremely Unfavorable to Extremely Favorable. In this example, the score is very favorable. However, the total score is usually a little less reliable and less informative than the subscale scores.

COMPARISON OF ITEM, SUBSCALE, AND TOTAL SCORE SCALES
A comparison of the quantitative scales at the previous levels for a total scale is shown here:

Score Level        Ex Unfav                    Neutral                         Ex Fav
Item                       0 ___________1____ 1.5 ____ 2___________3
Subscale 1(8 items) 0 ________________ 12 __________________24
Subscale 2(5 items) 0 ________________ 7.5 _________________15
Subscale 3(4 items) 0 _________________ 6 __________________12
Subscale 4(13 items)0 ________________19.5 ________________39
Subscale 5(6 items) 0 _________________ 9 __________________18
Total Scale(36 items)0 _______________ 54 __________________108

These levels of score reporting and interpretation are based on a single course. This information has the greatest value to you, your department chair, and the curriculum committee evaluating the course.

How does the total score compare to the global item scores given at the end of the scale? Which score should you use? Is there really any difference?

COPYRIGHT © 2010 Ronald A. Berk, LLC

Monday, November 9, 2009

Why Should GLOBAL ITEM SCORES Not Be Used for Summative Decisions? PART III

As a continuation of the last blog on the first two reasons for not using global items for summative decisions about faculty, this blog describes the third and most important reason:

3. PROFESSIONAL AND LEGAL STANDARDS: One or 2 global rating item scores alone for major summative decisions about faculty performance are totally inadequate. That administrative practice violates national testing/scaling practices according to the Standards for Educational and Psychological Testing and EEOC Uniform Guidelines on Employee Selection Procedures, plus the rulings from a large corpus of court cases on this topic. Essentially, it's ILLEGAL to make such personnel decisions about faculty. Clearly, these are PERSONNEL decisions about us, not instructional or curriculum decisions. In the case of employee decisions like these, 1 or 2 items do not reflect an accurate assessment of the instructor's job behaviors. A total scale score based on, for example, 35 items defining effective teaching behaviors, or subscale scores on specific areas of teaching competency would satisfy those criteria. A long history of court cases on personnel decisions indicates that the instrument used for personnel decisions must be based on a comprehensive job analysis of the job’s tasks related to a person’s knowledge, skills, and abilities (KSAs). The behaviors listed as items on the total scale satisfy that standard for teaching effectiveness.

Although administrators have used global items in some form for decisions about faculty teaching performance for quite some time, those practices should stop. They have lawsuit written all over them. As noted above, important, possibly career-changing, individual personnel decisions are held to the highest standards professionally and legally, as they should be. If the instructor being violated is a minority or female, be prepared for an EEOC offensive. If you know an administrator who is engaging in such practices, the recommendation is “cease and desist.”

What’s the alternative? Use the total scale score or subscale scores for different areas of performance in conjunction with other measures, such as peer evaluations, self-ratings, and a dozen other possible sources of evidence (for further details, see my October 25, 2009 blog and Thirteen Strategies… Stylus Publisher link in right margin).

Please let me know your thoughts and observations on my recommendations in these blogs.

COPYRIGHT © 2009 Ronald A. Berk, LLC

Sunday, November 8, 2009

Why Should GLOBAL ITEM SCORES Not Be Used for Summative Decisions? PART II

Please consider the following 2 reasons why the global items may not be the best option for those summative decisions:

1. ACCURACY AND FAIRNESS: After your students have spent 45 hours in your course over the semester, does their rating of 1 item seem to accurately capture the sum total of all of the experiences in your classroom? There is no doubt that the item furnishes information about your performance and the course, but should it be used for summative, super-important decisions about your career? Is 1 item score of 0-4 a fair and reasonable base to infer overall performance? Could you accurately and fairly rate the performance of your administrative assistant, department chair, or dean with 1 item to truly evaluate his or her degree of effectiveness?

2. VALIDITY AND RELIABILITY: Think of student rating scales like the S.A.T. One item from the S.A.T. doesn't provide a valid and reliable score of verbal ability any more than 1 or 2 global items on a rating scale provide a valid and reliable measure of teaching performance. Although the former measures knowledge and the latter measures attitudes or opinions, the psychometric problem is virtually identical. Such data would be tantamount to giving high schoolers 1 or 2 verbal and quantitative items from the S.A.T. and making individual college admission decisions on those scores. Those scores are technically unreliable. One or 2 items are extremely unreliable compared to a summary score derived from 5 subscales or 35 items. Reliability coefficients should be in the .80s-.90s for individual decisions, although coefficients as low as the .60s are acceptable for group-based decisions in research. Typically, a single item does not yield a coefficient in the acceptable range.

The third reason, PROFESSIONAL AND LEGAL STANDARDS, will be covered in the next blog. It is the most important of the 3 reasons. Don’t miss it.

COPYRIGHT © 2009 Ronald A. Berk, LLC

Saturday, November 7, 2009

Should GLOBAL ITEM SCORES from Student Evaluations Be Used for Important Faculty Decisions? PART I

I know what you’re thinking: “I thought you were done with this student rating stuff. Get off it already.” I know, but I rethought my thought after receiving my weekly report that more people read my blogs this past week than at any other time over the past 3 months. Maybe I hit a note; maybe not. Anyway, these topics keep popping up on listservs and workshops.

But that’s not REEEAALLY what you were thinking. It is: “What in the world is a global item?” For those of you who are not of this world and are just passing through, as I am, here is a profile of the global item:

1. It provides a general, broad-stroke indication of teaching performance. It's intended to be an omnibus item, representing the collective judgments on all other items. But it isn't. Only the total scale score does that.
2. It doesn’t address specific teaching and course characteristics.
3. It usually appears at the end of the rating scale and should not be summed with the scores of all other items.

Using a "Strongly Agree-Strongly Disagree" anchor response scale, a couple of examples are given below:

Overall, my instructor is a dirtbag.
Overall, I learned squat in this course.

Of course, you know I’m kidding. Better items are:

Overall, my instructor is a moron.
Overall, this course is putrid.

Usually, there are 1 to 3 items. Frequently, administrators, such as your department chair, associate dean, or emperor or empress, will be encouraged to use the ratings on those items to provide a simple, quick-and-dirty measure of your teaching performance. Those ratings, in lieu of the total scale or subscale scores, are used in conjunction with other information to arrive at summative decisions regarding merit pay, contract renewal for full-time and adjunct faculty, and promotion and tenure recommendations. Are these important decisions about your career and life. You bet!!

Do you want those decisions to be rendered on the basis of 1 or 2 items? “Sure, why not?” Are you kidding me? There are several logical, psychometric, and legal reasons why global items should NOT be used for summative decisions. They will be described in my next blog. Stay tuned for more fun from RatingWorld.

COPYRIGHT © 2009 Ronald A. Berk, LLC