HOW DOES TOTAL SCORE COMPARE TO GLOBAL ITEM SCORES?
On many scales, global items are included at the end. These items ask students to provide a summary rating of the instructor and/or course. They take on a variety of formats, but the purpose is the same. For example, using anchors ranging from Excellent to Poor, the following items might be given:
What was the overall quality of your instructor’s teaching?
What was the overall value of this course?
OR
using Strongly Agree to Strongly Disagree,
This is the worst instructor on the planet.
This course sucks.
LIMITATIONS: The item response to each of these items would be interpreted the same as any other item on the scale. The problem is that these items are not diagnostic for teaching improvement. They provide an overall rating.
So what’s the problem? Individual item responses, either percentage responses to anchors or item means/medians, are usually unreliable. When those responses are used to suggest areas for improvement, they serve as a guide. No major career-shattering decisions are being made. If global item responses are used for summative decisions by your department chair or the promotion committee, there is a lot more at stake.
RECOMMENDATION: Subscale or total scale scores that summarize the quality of teaching or the value of the course are usually more reliable, based on a collection of items measuring those characteristics, than just a single item. It is recommended that those scores be used in lieu of global item scores whenever possible for any summative decisions about teaching performance.
My final blog in this bloated series will briefly describe criterion- and norm-referenced interpretations of the scores previously defined.
COPYRIGHT © 2010 Ronald A. Berk, LLC
WHAT ARE TOTAL SCALE SCORES?
The highest level of score summary is the total scale score across all items. It’s like the total score on a test, except there are no right and wrong answers on a scale. If the scale consists of 36 items, each scored 0–3, the following results might be reported:
Total scale score range = 0–108; Midpoint = 54
Mean/Median = 96.43/101, where N = 97
The continuum for interpretation would be the following:
Extremely Mean Extremely
Unfavorable Neutral 96.43 Favorable
0__________________54___________________108
Mdn
101
INTERPRETATION OF TOTAL SCORES: This score gives a global, or composite, rating that is only as high as the ratings in each of its component parts (anchors, items, and subscales). It represents one overall index of teaching performance, from Extremely Unfavorable to Extremely Favorable. In this example, the score is very favorable. However, the total score is usually a little less reliable and less informative than the subscale scores.
COMPARISON OF ITEM, SUBSCALE, AND TOTAL SCORE SCALES
A comparison of the quantitative scales at the previous levels for a total scale is shown here:
Score Level Ex Unfav Neutral Ex Fav
Item 0 ___________1____ 1.5 ____ 2___________3
Subscale 1(8 items) 0 ________________ 12 __________________24
Subscale 2(5 items) 0 ________________ 7.5 _________________15
Subscale 3(4 items) 0 _________________ 6 __________________12
Subscale 4(13 items)0 ________________19.5 ________________39
Subscale 5(6 items) 0 _________________ 9 __________________18
Total Scale(36 items)0 _______________ 54 __________________108
These levels of score reporting and interpretation are based on a single course. This information has the greatest value to you, your department chair, and the curriculum committee evaluating the course.
How does the total score compare to the global item scores given at the end of the scale? Which score should you use? Is there really any difference?
COPYRIGHT © 2010 Ronald A. Berk, LLC
APPLICATION TO DIFFERENT FORMS
Although each of you is using a different rating form with different numbers of items and scores, those differences do not matter in score interpretation. Whether you’re using a commercial package, such as IDEA, SIR II, PICES, or CIEQ DU SOLEIL, or a “homegrown scale,” there are only so many score reporting possibilities for any form in Likert-type format. So my suggestions are generic and should be applicable to your form. I encourage you to consult the guidelines or manual for your reporting system for more specific information.
FIVE BASIC CATEGORIES OF RESULTS
There are 5 possible categories of results reported for most student rating forms:
1. anchor distribution of percentages
2. item statistics (mean and/or median)
3. subscale statistics (mean and/or median)
4. total scale statistics (mean and/or median)
5. summary of comments to open-ended questions
Your report form may not provide all of the above, but it should certainly give you at least 2 and 4.
WHAT DO FACULTY NEED?
That's a lot of information. You could use all of those results, however, 1 and 2, in particular, provide the most valuable diagnostic info to revise teaching or course materials that will benefit your next course-load of students. These are called formative decisions about teaching. Category 5 can explain the reasons for the ratings to 1 and 2.
WHAT DO ADMINISTRATORS NEED?
Summative decisions about annual contract renewal, merit pay, or promotion and tenure review by department chairs, associate deans, etc. can be based on 3 and 4 and possibly the global item scores.
This blog series will focus primarily on the faculty needs. My next blog will examine the 1st level of interpretation: ANCHOR-WORLD!
COPYRIGHT © 2010 Ronald A. Berk, LLC