BERK'S BLOG: From the keyboard of the "Humor Professor"

My blogs reflect my research interests and reflections on issues in teaching, PowerPoint, social media, faculty evaluation, student assessment, time management, and humor in teaching/training and in the workplace. Occasional top 10 lists may also appear on timely topics. They are intended for your professional use and entertainment. If they are seen by family members or pets, I am not responsible for the consequences. If they're not meaningful to you, let me know. ENJOY!

Showing posts with label faculty evaluation. Show all posts

Monday, September 27, 2010

“A FRACTURED, SEMI-FACTUAL HISTORY OF STUDENT RATINGS OF TEACHING: Finale!”

Epilogue

Well, there it is. I bet you’re thinking: “History, schmistory. What was that all about?” I’m sure your eyeballs hurt from rolling so many times and that one time when you’re your contacts blew out. Despite this cutesy romp through “Student-Ratings World” and a staggering 873 books and thousands of articles, monographs, conference presentations, blogs, etc. on the topic, some behaviors remain the same. For example, even today, the mere mention of teaching evaluation to many college professors triggers mental images of the shower scene from Psycho, with those bloodcurdling screams. They’re thinking, “Why not just beat me now, rather than wait to see my student ratings again.” Hummm. Kind of sounds like a prehistoric concept to me (a little "Meso-Pummel" déjà vu).

Despite the progress made with deans, department heads, and faculty moving toward multiple sources of evidence for formative and summative decisions, student ratings are still virtually synonymous with teaching evaluation in the United States, which is now located in Canada. They are the most influential measure of performance used in promotion and tenure decisions at institutions that emphasize teaching effectiveness. This popularity not withstanding, maybe the ubiquitous student rating scale will fair differently in the next "Meso-Cutback Era" by 2020! I hope I can update this schmistory for you then.

References
Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system (3nd ed.). San Francisco: Jossey-Bass.
Berk, R. A. (2006). Thirteen strategies to measure college teaching. Sterling,VA: Stylus.
Knapper, C. & Cranton, P. (Eds). (2001). Fresh approaches to the evaluation of teaching (New Directions for Teaching and Learning, No. 88). San Francisco: Jossey-Bass.
Me, I. M. (2003). Prehistoric teaching techniques in cave classrooms. Rock & a Hard Place Educational Review, 3(4), 10−11.
Me, I. M. (2005). Naming institutions of higher education and buildings after filthy rich donors with spouses who are dead or older. Pretentious Academic Quarterly,14(4), 326−329.
Me, I. M., & You, W. U. V. (2005). Student clubbing methods to insure teaching
accountability. Journal of Punching & Pummeling Evaluation, 18(6), 170−183.
Seldin, P. (Ed.). (2006). Evaluating faculty performance. San Francisco: Jossey-Bass.

I gratefully acknowledge the valuable feedback of Raoul Arreola, Mike Theall, Bill Pallett, and another student-ratings expert for reviewing the skimpy facts reported in this blog series. To ensure the anonymity of one of the reviewers, I have volunteered him for the Federal Witness Protection Program or the USA cable TV series In Plain Sight. I forget which.

COPYRIGHT © 2010 Ronald A. Berk, LLC

Monday, September 20, 2010

“A FRACTURED, SEMI-FACTUAL HISTORY OF STUDENT RATINGS OF TEACHING: Meso-Meta Era (1980s)!”

A HISTORY OF STUDENT RATINGS: Meso-Meta Era (1980s)
The 1980s were really booooring! The research continued on a larger scale, and statistical reviews of the studies (a.k.a. meta-analyses) were conducted by such authors as Cohen (1980, 1981), d’Apollonia and Abrami (1997), and Feldman (1989). Of course, this period had to be labeled the Meso-Meta Era.

Book-wise, Peter Seldin of Pace University in upstate Saskatchewan published his first of thousands of books on the topic, Successful Faculty Evaluation Programs (1980). Ken Doyle produced his second book on the topic, Evaluating Teaching (1981), four years later (Are you still awake?).

The administration of student ratings metastasized throughout academe. By 1988, their use by college deans spiked to 80%, with still only a paltry 14% of deans gathering evidence on the technical aspects of their scales.

That takes us to—guess what? The next to last era in this blog series. Whew.

The next blog covers the 1990s with the major contributions by names you will know as gas prices spiked during the “Meso-Unleaded Era.”

COPYRIGHT © 2010 Ronald A. Berk, LLC

Tuesday, September 7, 2010

“A FRACTURED, SEMI-FACTUAL HISTORY OF STUDENT RATINGS OF TEACHING: The State of the Art!”

State-of-the-Art of Student Ratings
There is more research on student ratings than any other topic in higher education. More than 2,500 publications and presentations have been cited over the past 90 years. Those ratings have dominated as the primary and, frequently, only measure of teaching effectiveness at colleges and universities for the past five decades. In fact, the evaluation of teaching has been in a metaphorical cul-de-sac with student ratings as the universal barometer of teaching performance. And, if you’ve ever been in a cul-de-sac or metaphor, you know what that’s like. OMGosh, it can be Stephen Kingish terrifying.

In surveys over the past decade, it was found that 86% of U.S. liberal arts college deans and 97% of department chairs use student ratings for summative decisions about faculty. Only recently has there been a trend toward augmenting those ratings with other sources of evidence and better metaphors (Arreola, 2007; Berk, 2006; Knapper & Cranton, 2001; Seldin, 2006).

So how in the ivory tower did we get to this point? Let’s trace the major historical events. Hold on to your online administration response rates. Here we go.

A History of Student Ratings
This history covers a timeline of approximately 100 billion years, give or take a day or two, ranging from the age of dinosaurs to the age of Conan O’Brien’s new cable TV show. Obviously, it’s impossible to squish every event that occurred during that period in this series. Instead, that span is partitioned into six major eras within which salient student-ratings activities are highlighted. A blog will be devoted to each of those eras.

References

Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system (3nd ed.). San Francisco: Jossey-Bass.
Berk, R. A. (2006). Thirteen strategies to measure college teaching. Sterling,VA: Stylus.
Knapper, C., & Cranton, P. (Eds). (2001). Fresh approaches to the evaluation of teaching (New Directions for Teaching and Learning, No. 88). San Francisco: Jossey-Bass.
Seldin, P. (Ed.). (2006). Evaluating faculty performance. San Francisco: Jossey-Bass.

My 1st era blog will tackle prehistoric student ratings of the “Meso-Pummel Era.” How did cave men and women measure teaching performance? Their methods were a bit crude, but effective.

COPYRIGHT © 2010 Ronald A. Berk, LLC

Friday, March 5, 2010

WHAT CAN EDUCATORS LEARN ABOUT SCORING FROM THE 2010 OLYMPIC GAMES?

Based on your Olympic scoring observations and yesterday’s blog, what does any of that have to do with education? NOTHING! Kidding. Here are a few thoughts.

EDUCATIONAL IMPLICATIONS
The process for improving the objectivity of scoring and grading in classroom assessment, evaluation of teaching performance, and program evaluation has been fraught with many of the same challenges, just maybe not a cauldron, but certainly a large bucket.

Student Assessment. The methods have ranged from multiple-choice tests to essay tests to performance tests to student portfolios (to OSCEs with standardized patients in medicine and nursing). The increased precision of the judgmental scoring systems with explicit rubrics has reduced bias and improved the validity and reliability of the scores to measure learning outcomes.

Faculty Evaluation. Among 14 potential sources of evidence to evaluate faculty teaching, most ALL are based on the judgments of students and “informed” professionals. There are more than 10 possible types of bias that can affect those judgments. However, the triangulation of multiple sources can compensate to some degree for the fallibility of each measure. This approach can also be generalized to promotion and tenure reviews based on the faculty portfolio.

Program Evaluation. Any of the preceding measures plus a collection of other quantitative and qualitative sources are used to determine program effectiveness based on specific program outcomes. Tests, scales, questionnaires, and interview schedules provide complementary evidence on impact.

STATE-OF-THE-ART
Unfortunately, let’s face it: Complete measurement objectivity has eluded educators since Gaul was divided into 4.5 parts. It’s an intractable problem in evaluating faculty, students, and programs. Psychometrically, the state-of-the-art of behavioral and social assessment indicates it is just as fallible as Olympic scoring, but it’s the best that we have.

If we could just balance these scoring systems with a speed-teaching, speed-learning, or speed-mentoring, time-based approach, we could come closer to what we observed in the Olympic events. Also, there may be a lower risk of injury and wipe-outs in the classroom.

What do you think? Any ideas on any of the above are welcome. If you come up with the solution, you could receive a medal! However, I’m not sure of the scoring rubric for that medal, but I know it will be fair.

COPYRIGHT © 2010 Ronald A. Berk, LLC

Thursday, December 10, 2009

BLOGGUS INTERRUPTUS: A NONRANDOM THOUGHT ON FACULTY EVALUATION FOR ACCREDITATION

I’m sorry to interrupt my blog series on “How to Put Pizazz into Your PowerPoint Conference Presentations,” but an article was just published that might be useful to some of you.

Are you struggling with issues related to student ratings of teaching, peer evaluation, teaching portfolio, and which gifts to buy for your family for the holidays? Well, have I got a deal for you. I have 2 articles recently published on a faculty evaluation model that might interest you. They are an extension of the model described in my Thirteen Strategies to Measure Teaching book (see Stylus Publishing link in right margin), applied specifically to formative and summative decisions about faculty. The model can be used to evaluate teaching performance and professionalism. Here they are:

Berk, R. A. (2009a). Beyond student ratings: “A whole new world, a new fantastic point of view.” Teaching Excellence, 20(1).

Berk, R. A. (2009f). Using the 360° multisource feedback model to evaluateteaching and professionalism. Medical Teacher, 31, 1073–1080.

The 1st article above is a brief description of the model; the 2nd is a full-blown presentation of the model and lit review. Although the latter was written for professors and administrators in medical schools, all of the characteristics are generalizable to any discipline, department, school, or kingdom. The model is simple, straightforward, and easily applied to impress even accreditation reviewers of your evaluation plan in your self-study. An abstract of the article is given below:

This MT article provides an overview of the salient characteristics, research, and practices of the 360° MSF models in management/industry and clinical medicine. Drawing on that foundation, the model was adapted to the specific decisions rendered to evaluate faculty teaching performance and professional behaviors. What remains unchanged in every application is the original spirit of the model and its primary function:
Multisource Ratings→Quality Feedback→Action Plan to Improve→Improved Performance.
Although the ratings were intended for formative decisions, in many cases they have also ended up being used for summative decisions. All of these applications of the 360° MSF model have advantages and disadvantages. In fact, it is possible to distill several persistent and, perhaps, intractable psychometric issues in executing these models. The top 10 issues are described. Although much has been learned during the 80-year history of scaling, 60-year history of faculty evaluation, and 50-year history of the 360° MSF model in management/industry, a lot of work is still necessary to realize the true meaning of “best practices” in evaluating teaching and professionalism.

You can download the articles from my Website (www.ronberk.com) under Publications. They are intended for your own use and research purposes. Please do not distribute to family members or farm animals. The latter may eat them and get sick. Enjoy!

COPYRIGHT © 2009 Ronald A. Berk, LLC

Sunday, October 25, 2009

What Scores Should Be Reported from Student Ratings of Faculty?

Recently, I was involved in a spirited marathon discussion with a bunch of colleagues on technical issues related to student ratings of teaching performance. One big topic was: How do you report results for formative and summative decisions? I thought some of my bloggees might be interested in the options available. These options with report form examples appear in my Thirteen Strategies... book (see Stylus link to right).

In order to answer the question, you don't need to administer multiple rating forms. There are a lot of options with the results from just one form. It is possible to "have your cake.." with one form for both formative and summative decisions up to a point. The trick is how the results are analyzed and reported for each decision maker.

Psychometrically speaking, I recommend the following:
1. A structured scale with 4-6 subscales measuring separate constructs such as Class Organization, Teaching Methods, Evaluation Techniques, and so on. The faculty evaluation lit reports several major constructs based on factor analyses. These core teaching behaviors should be generic enough to apply to most courses and disciplines.
2. A separate section devoted to course-specific items each instructor might want to add should be included. This optional section might contain up to 10 items.
3. One to three global items may be included as well, although individual item alpha reliabilities are typically much lower than item aggregates, such as subscale or total scale scores.
4. An unstructured section containing 2-5 stimulus questions to which students can comment is also important. Loads of online administrations reveal students spent considerable time typing buckets of comments. Frequently those comments explain the responses to some of the structured item ratings. Both forms of evaluation are valuable and furnish complementary information on teaching performance.

Analysis-wise, the above structure permits results at the following levels:
1. anchor distribution of percentages
2. item statistics such a mean and median (almost all distributions are negatively skewed)
3. subscale statistics
4. total scale statistics
5. summary of comments by stimulus question

That's a lot of information. Faculty would benefit from 1-5. 1 and 2, in particular, provide valuable diagnostic info to revise teaching or course materials that will benefit their next course-load of students. It is formative feedback only in that sense. Other formative methods administered during the course should be considered. You already know about those options.
Summative decisions by department chairs, associate deans, etc. can be based on 3 and 4 and possibly the global item scores.

The above strategy is certainly not new, but it is the simplest to get the biggest bang from your student rating scale. Of course, it is only 1 of 14 sources of evidence you might use in measuring teaching performance. Multiple sources of evidence should be involved in summative (personnel) decisions about faculty contract renewal, merit pay, and promotion and tenure. After all, faculty careers are on the line.

If you're grappling with this issue, I hope these suggestions may be helpful.
COPYRIGHT © 2009 Ronald A. Berk, LLC