BERK'S BLOG: From the keyboard of the "Humor Professor"

My blogs reflect my research interests and reflections on issues in teaching, PowerPoint, social media, faculty evaluation, student assessment, time management, and humor in teaching/training and in the workplace. Occasional top 10 lists may also appear on timely topics. They are intended for your professional use and entertainment. If they are seen by family members or pets, I am not responsible for the consequences. If they're not meaningful to you, let me know. ENJOY!

Showing posts with label median. Show all posts

Thursday, May 27, 2010

A BerksNotes® GUIDE TO INTERPRETING STUDENT RATING RESULTS: Subscale Level

WHAT ARE SUBSCALE SCORES?
This is the first level at which item scores can be summed. If the items on the total scale are grouped into clusters related to topics such as instructional methods, evaluation methods, and course content, the item scores can be summed to produce subscale scores. These scores should only be used for decision making if adequate validity and reliability evidence support that internal scale structure. Since each subscale contains a different number of items, the score range will also be different. This range must be reported to interpret the results.

The summary of subscale score results is derived from the item score results. The statistics are the same. We are just aggregating or summarizing the item-level data (0–3) into subscale item clusters. For example, here are results for three selected subscales:

Instructional Methods (IM)—13 items
     Subscale score range = 0–39; Midpoint = 19.5
     Mean/Median = 34.79/37.00, where N = 97

Evaluation Methods (EM)—5 items
     Subscale score range = 0–15; Midpoint = 7.5
     Mean/Median = 13.40/15.00, where N = 97

Course Content (CC)—8 items
     Subscale score range = 0–24; Midpoint = 12
     Mean/Median = 21.97/24.00, where N = 97

COMPUTATION OF SUBSCALE SCORES: The interpretation of subscale results is analogous to the item results; only the numbers are BIGGER. For example, instead of a 0–3 range and a midpoint of 1.5 for an item, each subscale has a range and midpoint based on its respective number of items. So, for the Instructional Methods (IM) subscale with 13 items, a "0" (SD) response to every item produces a sum of 0 for the subscale, and a "3" (SA) response to all 13 items yields a sum of 39.

INTERPRETATION OF SUBSCALE MEANS AND MEDIANS: The zero-base for all score interpretations is easy to remember: the worst, most unfavorable rating on any item, subscale, or total scale is "0." What changes is the upper score limit for the most favorable rating on each subscale because the number of items change. Again, for the IM subscale, the mean and median can be referenced to the upper limit of 39 and also the midpoint of 19.5 to locate the position on the continuum, as indicated below:

Extremely                                                     Mean       Extremely
Unfavorable                       Neutral                34.79      Favorable
       0___________________19.5____________________39
                                                                             Mdn
                                                                              37
The mean/median ratings on the IM subscale are very favorable.

The subscale results can pinpoint areas of strength and weakness. They may be used by your department chair or the promotion review committee to identify your teaching strengths across different courses. Subscale scores cannot direct you toward particular aspects of teaching that can be improved or changed. The item and anchor results described previously are intended to provide that detailed level of direction.

Finally, the next blog will examine total scores on the scale. What additional info do they provide beyond what we already know?

COPYRIGHT © 2010 Ronald A. Berk, LLC

Tuesday, May 25, 2010

A BerksNotes® GUIDE TO INTERPRETING STUDENT RATING RESULTS: Item Level—Part 2

SHOULD YOU USE THE ITEM MEAN OR MEDIAN? As mentioned previously, when the anchor distribution is negatively skewed, the mean will always be lower than the median, as it is for all three items displayed in the previous blog. The reason is that the mean is drawn toward the few extremely low ratings of SD. The mean is sensitive to extreme scores. (Statisticians’ Concern: For as long as statisticians can remember, the mean has always had an attraction for extreme scores. Granted, this affinity for outliers is not normal. However, statisticians have tolerated this abnormal relationship for years, but also felt compelled to create another index that is not so easily swayed: the median. You may now resume this paragraph already in progress.) Depending on the degree of skew, the bias in interpreting the mean can be significant or insignificant.

DIRECTION AND DEGREE OF ITEM MEAN BIAS: The problem is that the mean misrepresents the actual ratings in a negatively skewed distribution by portraying lower class ratings than actually occurred. This bias MAKES THE INSTRUCTOR APPEAR WORSE in teaching performance, on all of the items, than the students’ ratings indicate. This is not very desirable, especially if these results are used for summative decisions by your department chair or associate dean.

Although means are reported on most commercially published scales, it is strongly recommended that MEDIANS SHOULD BE REPORTED ALONG WITH THE MEANS. Although the median is less discriminating as an index, it is more accurate, more representative, and less biased than the mean for markedly skewed distributions. The lower the degree of skew, the more similar both measures will be. In a perfectly normal distribution, the mean and median are identical. However, keep in mind that ratings of faculty, administrators, courses, programs, and fast food are typically skewed. Therein lays the importance of picking the right index.

BOTTOM LINE RECOMMENDATION: Use both mean and median.

INTERPRETATION: PROFILE OF STRENGTHS AND WEAKNESSES: Since the item means/medians are based on the total N for the class, they can be compared. They display a profile of strengths and weaknesses related to the different teaching behaviors and course characteristics. On a 0−3 scale, means/medians above 1.5 indicate strengths; those below 1.5 denote weaknesses. The means/medians in conjunction with the anchor percentages provide meaningful diagnostic information on areas that might need attention. Again, this report is intended for the instructor's use primarily, although the results on course characteristics may have curricular implications.

Next, the meaning and uses of subscale and total scale scores will be discussed. They are basically summaries of the item scores. Hope you’re finding this stuff helpful. If not, let me know.

COPYRIGHT © 2010 Ronald A. Berk, LLC

Sunday, May 23, 2010

A BerksNotes® GUIDE TO INTERPRETING STUDENT RATING RESULTS: Item Level—Part 1

WHAT ARE ITEM SCORES?
The next level is the item, where a statistic such as a mean or median is reported. Since most anchor distributions are usually negatively skewed and answers are on a ranked, or ordinal, scale, the median is the most appropriate measure of central tendency. However, given the range of distributions that can occur, you may see both the mean and median on your report form.

“WAIT!! Back up. How did you get from responses of SD, D, etc. to means and medians?” Great question! Glad you’re on the ball. First, you have to convert the “verbal” anchors into “numbers.”

(MEASUREMENT ALERT: Keep in mind that we started with a “qualitative scale” of verbal expressions of how students feel about each behavior and now we’re converting the words into a “quantitative scale” for the convenience of performing analysis of those feelings. Actually, this conversion involves an arbitrary numerical coding scheme.)

CREATE A ZERO-BASED NUMERICAL SCORE SCALE: For simplicity and interpretability, a zero-based scale is recommended, so that the most negative anchor, such as SD, would be coded as “0.” Zero-based scoring was originally recommended by Likert (1932), who created this scaling method. Then the other anchors would be coded in 1-point increments above 0.

Higher values weight more desirable or positive ratings higher than negative ones. SA or Strongly Agreeing with a desirable teaching behavior or course characteristic is weighted with the highest value of 3. An example of this coding for a 4-point, agree–disagree scale is shown below:

SD   D    A   SA
0     1    2     3

The score range for this single item is 0 to 3. (Note: These score points will vary with the number of anchors and the base number on different scales. Yours may be one of these. Sometimes the number 1 is used as the base instead of 0. Although the number scale may be different, the final interpretation will be the similar.)

COMPUTATION OF ITEM MEANS AND MEDIANS: If you hate stat, this section may make you hurl. Skip it. (SIDEBAR: Over 30 years of teaching stat, I had lots of student hurlers.) For you interested nonhurlers, here are the simple computational definitions:

MEAN = the sum of all students’ scores to each item, divided by the number of students or N. This is the average score for an item, within the range of 0–3 for this example.

MEDIAN = the middle score, after all students’ scores are ranked from high to low.

An example report, based on the anchor data shown in the previous blog, is shown below:

                      SD       D         A        SA       N    Mean   Median
Statement 1   1.0%   3.1%   37.5%   58.6%   96    2.52    3.00
Statement 2   1.0     3.1     24.0     71.9     96    2.65    3.00
Statement 3   1.1     1.1     28.9     68.9     90    2.57    3.00

The median score of 3 means the typical student in the middle of the distribution rated those behaviors as SA. The means were slightly lower with ratings between A and SA. Those are very respectable scores. Of course, they are consistent with the anchor percentage distribution, where the highest percentages are concentrated on the A and SA anchors.

So which index should you use? Mean? Median? Or both? Ah ha! The statistical plot thickens. Stay tuned…

COPYRIGHT © 2010 Ronald A. Berk, LLC