BERK'S BLOG: From the keyboard of the "Humor Professor": summative decisions

Showing posts with label summative decisions. Show all posts

Monday, May 17, 2010

A BerksNotes® GUIDE TO STUDENT RATING SCORE INTERPRETATION: Overview

APPLICATION TO DIFFERENT FORMS
Although each of you is using a different rating form with different numbers of items and scores, those differences do not matter in score interpretation. Whether you’re using a commercial package, such as IDEA, SIR II, PICES, or CIEQ DU SOLEIL, or a “homegrown scale,” there are only so many score reporting possibilities for any form in Likert-type format. So my suggestions are generic and should be applicable to your form. I encourage you to consult the guidelines or manual for your reporting system for more specific information.

FIVE BASIC CATEGORIES OF RESULTS
There are 5 possible categories of results reported for most student rating forms:

1. anchor distribution of percentages
2. item statistics (mean and/or median)
3. subscale statistics (mean and/or median)
4. total scale statistics (mean and/or median)
5. summary of comments to open-ended questions

Your report form may not provide all of the above, but it should certainly give you at least 2 and 4.

WHAT DO FACULTY NEED?
That's a lot of information. You could use all of those results, however, 1 and 2, in particular, provide the most valuable diagnostic info to revise teaching or course materials that will benefit your next course-load of students. These are called formative decisions about teaching. Category 5 can explain the reasons for the ratings to 1 and 2.

WHAT DO ADMINISTRATORS NEED?
Summative decisions about annual contract renewal, merit pay, or promotion and tenure review by department chairs, associate deans, etc. can be based on 3 and 4 and possibly the global item scores.

This blog series will focus primarily on the faculty needs. My next blog will examine the 1st level of interpretation: ANCHOR-WORLD!

COPYRIGHT © 2010 Ronald A. Berk, LLC

Saturday, November 7, 2009

Should GLOBAL ITEM SCORES from Student Evaluations Be Used for Important Faculty Decisions? PART I

I know what you’re thinking: “I thought you were done with this student rating stuff. Get off it already.” I know, but I rethought my thought after receiving my weekly report that more people read my blogs this past week than at any other time over the past 3 months. Maybe I hit a note; maybe not. Anyway, these topics keep popping up on listservs and workshops.

But that’s not REEEAALLY what you were thinking. It is: “What in the world is a global item?” For those of you who are not of this world and are just passing through, as I am, here is a profile of the global item:

1. It provides a general, broad-stroke indication of teaching performance. It's intended to be an omnibus item, representing the collective judgments on all other items. But it isn't. Only the total scale score does that.
2. It doesn’t address specific teaching and course characteristics.
3. It usually appears at the end of the rating scale and should not be summed with the scores of all other items.

Using a "Strongly Agree-Strongly Disagree" anchor response scale, a couple of examples are given below:

Overall, my instructor is a dirtbag.
Overall, I learned squat in this course.

Of course, you know I’m kidding. Better items are:

Overall, my instructor is a moron.
Overall, this course is putrid.

Usually, there are 1 to 3 items. Frequently, administrators, such as your department chair, associate dean, or emperor or empress, will be encouraged to use the ratings on those items to provide a simple, quick-and-dirty measure of your teaching performance. Those ratings, in lieu of the total scale or subscale scores, are used in conjunction with other information to arrive at summative decisions regarding merit pay, contract renewal for full-time and adjunct faculty, and promotion and tenure recommendations. Are these important decisions about your career and life. You bet!!

Do you want those decisions to be rendered on the basis of 1 or 2 items? “Sure, why not?” Are you kidding me? There are several logical, psychometric, and legal reasons why global items should NOT be used for summative decisions. They will be described in my next blog. Stay tuned for more fun from RatingWorld.

COPYRIGHT © 2009 Ronald A. Berk, LLC

Sunday, October 25, 2009

What Scores Should Be Reported from Student Ratings of Faculty?

Recently, I was involved in a spirited marathon discussion with a bunch of colleagues on technical issues related to student ratings of teaching performance. One big topic was: How do you report results for formative and summative decisions? I thought some of my bloggees might be interested in the options available. These options with report form examples appear in my Thirteen Strategies... book (see Stylus link to right).

In order to answer the question, you don't need to administer multiple rating forms. There are a lot of options with the results from just one form. It is possible to "have your cake.." with one form for both formative and summative decisions up to a point. The trick is how the results are analyzed and reported for each decision maker.

Psychometrically speaking, I recommend the following:
1. A structured scale with 4-6 subscales measuring separate constructs such as Class Organization, Teaching Methods, Evaluation Techniques, and so on. The faculty evaluation lit reports several major constructs based on factor analyses. These core teaching behaviors should be generic enough to apply to most courses and disciplines.
2. A separate section devoted to course-specific items each instructor might want to add should be included. This optional section might contain up to 10 items.
3. One to three global items may be included as well, although individual item alpha reliabilities are typically much lower than item aggregates, such as subscale or total scale scores.
4. An unstructured section containing 2-5 stimulus questions to which students can comment is also important. Loads of online administrations reveal students spent considerable time typing buckets of comments. Frequently those comments explain the responses to some of the structured item ratings. Both forms of evaluation are valuable and furnish complementary information on teaching performance.

Analysis-wise, the above structure permits results at the following levels:
1. anchor distribution of percentages
2. item statistics such a mean and median (almost all distributions are negatively skewed)
3. subscale statistics
4. total scale statistics
5. summary of comments by stimulus question

That's a lot of information. Faculty would benefit from 1-5. 1 and 2, in particular, provide valuable diagnostic info to revise teaching or course materials that will benefit their next course-load of students. It is formative feedback only in that sense. Other formative methods administered during the course should be considered. You already know about those options.
Summative decisions by department chairs, associate deans, etc. can be based on 3 and 4 and possibly the global item scores.

The above strategy is certainly not new, but it is the simplest to get the biggest bang from your student rating scale. Of course, it is only 1 of 14 sources of evidence you might use in measuring teaching performance. Multiple sources of evidence should be involved in summative (personnel) decisions about faculty contract renewal, merit pay, and promotion and tenure. After all, faculty careers are on the line.

If you're grappling with this issue, I hope these suggestions may be helpful.
COPYRIGHT © 2009 Ronald A. Berk, LLC