Basic Math/Weighted avgs when basis is different
Expert: Josh - 5/19/2006
Question-------------------------
Followup To
Question -
Thank you, Josh. Your response has prompted three questions from me regarding "normalizing" the scores before multiplying them by the decimal weights.
1.) Could we just as easily assign a value for the Y/N questions from 1 to 5? Using your picture it would look like this:
__1__2__3__4__5
1_____________5
If so, how would we apply this to your example? My reason for wanting to keep the 1-5 scale and "adjust" the Y/N scale is that it would be easier to do that than have to "map" hundreds of survey responses in the database fro the 1-5 scale to the 0-4 scale you proposed. The Y/N questions didn't have an assigned scale other than the percentage of "yes" responses previously, so if we must adjust them, why not make the Y/N fit the 1-5 and leave the 1-5 "as is?" In other words, only have to "map" the Y/N questions.
2.) By using your example, the best, or "perfect" composite score for the store comes out to be a "4". While this is intuitive for students thinking in terms of Grade Point Averages (GPA) on 4-point scale, the managers of most tire stores are usually a little less collegiate; most would appreciate the familiarity of having their store's "top box" score related in terms of a percentage scale that goes from 0% to 100%.
On that scale, a "perfect score" would be "100%". Wouldn't it be just as correct to "normalize" the 1-5 scale scores by setting them in terms of percentages? If so, could we simply divide the score by "5" to arrive at a decimal equivalent? For example, if a store had an average score on Q1 of 4.2, we divide 4.2 by 5 (4.2/5=84) to get 84%? Then we can simply multiply .84 by our weight of .05 for Q1 to arrive at a weighted average score of .042, right?
Then I would assume we simply do the same for each of the other 1-5 scale questions, then obtain weighted average score for each of the Yes/No questions, and then add them all up to arrive at our "top box" score which is now expressed in terms of a 0-100% scale that everyone is used to.
For example, if Q2 were a Y/N question and Q2's average score were 92%, and the weight of Q2 were .05, we would multiply .92*.05=.046, and then add the value of .042 from Q1 above to arrive at a sum of .088. Of course, we would continue through all the rest of the questions, but up to this point we could say that 8.8% would be the store's cumulative "top box" score, at least until we finished the iterations for the other 7 of 9 questions, correct?
3.) If my understanding of "normalizing" is way off base, how WOULD we normalize the 1-5 scale questions to a percentage so we wouldn't have to change anything about the Y/N questions' scores, and still report the final composite "top box" score for each store in the familiar 0-100% scale?
Thanks in advance for your help!
Jeff
Josh:
I am trying to come up with a weighted average scoring system for survey questions on customer satisfaction that would allow separate retail tire store outlets compare their “top box” scores against each other. Each store's customers are asked a series of questions relating to their recent experience at the particular store they visited. The responses for the individual questions of each survey are averaged and the result appears on a “store total” line.
Three of the 9 total questions ask the customer to rate their level of satisfaction on a scale of 1-5 with 5 being the best. The other 6 questions are Yes/No questions where a “yes” response indicates satisfaction and a “no” indicates dissatisfaction. You might say that the “basis” for the two types of questions is different. For the 1-5 ratings a “store total” is displayed as a number such as 4.4 on that 1-5 scale. For the Yes/No questions the percentage of “yes” responses is displayed as 90%, for example.
Some questions relate to factors that are more important to customer satisfaction than others, and therefore warrant a higher weight than other factors. But because of the nature of the total average scores for each survey question, a curious phenomenon has revealed itself to me. Even though the percentage value of the weights (which add up to 100) is higher (.20) for two of the Yes/No questions (Q8 & Q9), two of the 1-5 rating questions (Q1 & Q3) actually produce a weighted average score that is higher in the line that gets added together to calculate the overall sum. It seems to me that this inadvertently places more value on Q1 & Q3 than intended, because a one-point drop in an average score affects the final sum, or “top box” score than a 10% drop in the average of Q8 & Q9!
Here is a table I plucked from Excel that helps illustrate the problem:
Question Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
Question Type 1-5 Y/N 1-5 Y/N Y/N 1-5 Y/N Y/N Y/N
Optimum Score Possible 5 100.0% 5 100.0% 100.0% 5 100.0% 100.0% 100.0%
Weight 5.00% 5.00% 5.00% 5.00% 10.00% 20.00% 10.00% 20.00% 20.00% 100.00%
Weighted Average Score 0.25 0.05 0.25 0.05 0.1 1 0.1 0.2 0.2 220.00
(Note that the last column contains the sum across the rows. The weighted average score's sum has been multiplied by 100 to make it easier to interpret minute differences between scores for the different tire store locations.)
In the table above the highest possible score is depicted in the row labeled “Optimum Score Possible.” This represents a “perfect score.” The question is this; is it mathematically incorrect to mix the 1-5 ratings with the Y/N responses? Is there something inherently wrong with multiplying different types of question “basis” by a weighting factor and then summing the “Weighted Average Score” row? (I realize that to make use of comparing scores between different stores under this scenario requires us to establish the value of 220 (2.2*100) as the benchmark “perfect score.”) In weighted average theory must all scores use the same “basis;” that is, must the question responses all be expressed as a value between 1-5 OR as a percent?
I have played around with an idea to make the “benchmark perfect score equal to 100%. In the table below I have changed the weights so that for the optimum scores possible, the “Weighted Average Scores” in the last row now add up to 100%. Notice that I changed the values in the weight row to accomplish this, but they now no longer add up to 100%. Doesn't this violate some mathematical rule for weighted averages? Even so, the weights of the individual questions now behave more in line with my original intention: to have Q6, Q8, and Q9 represent a higher value in measuring overall customer satisfaction. Plus, it puts the scores on a measurement scale that most people can easily relate to – 0% to 100%. This measurement scale doesn't require the end user to “readjust” their thinking to a measurement scale where 220 represents the highest score.
Question Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9
Question Type 1-5 Y/N 1-5 Y/N Y/N 1-5 Y/N Y/N Y/N
Optimum Score Possible 5 100.0% 5 100.0% 100.0% 5 100.0% 100.0% 100.0%
Weight 1.00% 5.00% 1.00% 5.00% 10.00% 4.00% 10.00% 20.00% 20.00% 76.00%
Weighted Average Score 0.05 0.05 0.05 0.05 0.1 0.2 0.1 0.2 0.2 100.00
I realize this is a pretty detailed example, but could you please respond with your comments and suggestions? Which of my two example tables is more correct? Is there a better way?
My problem isn't in making Excel do what I need; it is with the mathematics behind it!
Thanks!
Jeff
P.S. When I copied and pasted my text from Word, my Excel table wrapped. To help you make sense of the data, my example tables should be 6 rows deep by 11 columns wide, including the labels at the left and 2 total columns at the right for rows 4 & 6. The 5th row is a blank row to separate rows 4 & 6 for clarity and readability in the source document.
If you need me to I'll be happy to attach the document to an e-mail reply to you so you can see it better. Just e-mail me and I'll respond.
Answer -
Jeff,
Let me try answering your questions, then I'll make some suggestions.
Q: Is it mathematically incorrect to mix the 1-5 ratings with the Y/N responses?
A: Numbers and percentages are not really compatible. I think it is best to write everything in decimal.
Q: Is there something inherently wrong with multiplying different types of question ¡§basis¡¨ by a weighting factor and then summing the ¡§Weighted Average Score¡¨ row?
A: Since 1% is equivalent to 0.01, when you assign an arbitrary value "100%=1.0" to "Yes", you have implicitly (already) assigned a lower weight to an affirmative response. Straight-away, you are saying that the most favourable answer in a "Y/N" question is worth only 20% as much as a judgement on a five point scale.
This is probably not what you want. As an alternative, before you actually reach the weighted averaging stage, you can normalize all the scores, so that each question starts on an equal footing.
By "normalizing", I mean having some sort of consistency on the range of values that may be attained. To keep things simple, you may wish to assign a decimal value of 4 to a "Yes" and a value of 0 to a "No". ...[#1]
Accordingly, for questions which require a five point judgement, the values "1,2,3,4 and 5" obtained from the survey should be mapped to "0,1,2,3 and 4" respectively. ...[#2]
Let me draw a picture so you see the difference:
BEFORE (you used)
___1__2__3__4__5
0__1
AFTER (now you use)
0__1__2__3__4
0___________4
Remark on weights:
It doesn't really matter what weight you actually assign to each question. Use integers if it makes things easier. Make this process intuitive, forget about what they add up to. We'll take care of that later.
Let w(i) be the preliminary weight assigned to question i, where i=1,2,3,4,5,6,7,8,9.
Here is a hypothetical example (I'm just picking numbers off the top of my head. In practice, you select appropriate weights which reflect the importance of each criterion):
Suppose the raw weights are chosen as follows:
w(1)=100,
w(2)=200,
w(3)=150,
w(4)=400,
w(5)=1000,
w(6)=75,
w(7)=125,
w(8)=100,
w(9)=50,
Next, normalize these weights.
Define sum_w = w(1)+w(2)+w(3)+w(4)+w(5)+w(6)+w(7)+w(8)+w(9). It so happens that the raw weights add up to sum_w = 2200. Important thing is that you are not constrained in anyway how you pick those weights in the beginning.
For the purpose of weighted averaging, we define the actual weights as
W(i)=w(i)/sum_w.
In this example, we get
W(1)=w(1)/sum_w=100/2200=0.04545455;
W(2)=w(2)/sum_w=200/2200=0.09090909;
W(3)=0.06818182
W(4)=0.18181818
W(5)=0.45454545
W(6)=0.03409091
W(7)=0.05681818
W(8)=0.04545455
W(9)=0.02272727
These are effectively percentages (if you multiply by 100)
Finally, the weighted average score is given by SUM s(i)*W(i) over i=1,2,...9
Here, s(i) denotes the score given for each response to question i, using the normalization given in [#1] and [#2] from earlier.
That is, you calculate
s(1)*W(1)+s(2)*W(2)+s(3)*W(3)+s(4)*W(4)+...+s(9)*W(9) ...[#3]
Notice that s(i) is bounded between 0 and 4 by design, irrespective of the type of question involved.
Having done all these, finding the perfect score is trivial. Just set all the "s" to the highest possible score. i.e., s(1)=s(2)=s(3)=...=s(9)=4 Then, compute [#3].
AnswerRe: Q1
I chose 0 for "No" and 4 for "Yes" for a reason. If 1 and 5 are assigned to "No" and "Yes", a negative response will contribute a non-zero value (1/(5-1) after normalization) to the total score. This would bias the result. ("No" means "no", it should count for nothing; rather than making the rating more favorable).
It takes very little effort to convert a column from [1,2,3,4,5] to [0,1,2,3,4]. Suppose the former occupies cell A1. You can write the formula =A1-1 in an Excel spreadsheet (say, in cell AA1). Based on this simple formula, you can perform the necessary mapping on the entire column in no time (just copy and paste). Once you have the range values [0,1,2,3,4] computed in column AA, you copy and paste column AA "by value" back to column A. (To safeguard your work, make a back-up copy of the file before attempting this)
Re: Q2
Your approach is essentially the same as employing
{0,1,2,3,4}/4 OR {0, 0.25, 0.5, 0.75 1.0} for the five-point scale; and {0,1} for "yes/no". This part is fine.
It is not fundamentally different. Using my method, after everything is done, a perfect score is S(max)=4 and all other judgements S(n) are between 0 and 4. To express this as a percentage between 0% and 100%, why not compute 100*S(n)/S(max). Normalization is usually best left till last. It leaves less scope for error, as we don't have to fiddle with the weights. When you give percentages, the question that comes to mind is what is the quantity relative to? We often aren't prepared to answer this while we are in the middle of some calculation. You see, it just leads to more complication.
Re: Q2 part ii) part iii)
I would issue caution to the practice of treating 4.2/5=0.84 as a percentage. The main problem is that the 5-point scale and "yes/no" judgements are incongruous. When you use percentages, you tend to think that you are dealing with something of the same nature, but you are not.
As evident from your previous observation, the meaning of intermediate values are sometimes not well defined. Be careful with quoting cumulative scores, such interpretation can be very misleading.
Re: Q3
It's fine if you wish to associate the five-point and "yes/no" survey responses with {0,1,2,3,4}/4=[0,0.25,0.5,0.75,1.0] and {0,1} respectively.
You really need to include 0 as the lowest possible score to avoid bias. Otherwise, the negative responses will contribute to a more favorable score as small positive quantities add-up.
The algorithm for mapping from a column of raw data in the range "1-5" to the range "0-4" was given in an earlier section (see Re: Q1, paragraph 2).
You can circumvent all this following the method I originally proposed. Assuming that 0<=S(n)<=4 and S(max)=4, you simply have to calculate 100*S(n)/S(max) to obtain a final score in the range from 0 to 100.
If you truncate the number of questions from 9 to "m", you can do this (say, if m=3):
100*[s(1)*W(1)+s(2)*W(2)+s(3)*W(3)]/[S(max)*(W(1)+W(2)+W(3))]
The following is a complete example. Suppose you retain only (m=6) questions (Q1,Q2,Q3,Q4,Q6,Q7) from the survey.
Survey response:
x(1)="yes"
x(2)=4 selected from {1,2,3,4,5}
x(3)="yes"
x(4)=5 selected from {1,2,3,4,5}
x(6)=2 selected from {1,2,3,4,5}
x(7)=1 selected from {1,2,3,4,5}
Note: x(i) can be either the individual response or sample average for question i. For the latter, it would generally be some floating number (like 4.21235 rather than nice integers).
Map these to
Method 1:
s(1)=4, s(2)=3, s(3)=4, s(4)=4, s(6)=1, s(7)=0
Method 2: (alternative based on [0,1,2,3,4]/4)
s(1)=1, s(2)=0.75, s(3)=1, s(4)=1, s(6)=0.25, s(7)=0
Raw weights (Chosen to reflect importance of each question. Note: They don't have to add up to any particular value. You can express these as fraction of 1 if you like, but there's no need to.)
w(1)=100, w(2)=250, w(3)=175, w(4)=300, w(6)=50, w(7)=120
Sum of the w's: sum_w=995
Normalized weights: W(i)=w(i)/sum_w,
W(1)=0.100503
W(2)=0.251256
W(3)=0.175879
W(4)=0.301508
W(6)=0.050251
W(7)=0.120603
Finally, the average score for this store is 100*N/D, where
N=s(1)*W(1)+s(2)*W(2)+s(3)*W(3)+s(4)*W(4)+s(6)*W(6)+s(7)*W(7);
D=s(max)*[W(1)+W(2)+W(3)+W(4)+W(6)+W(7)].
For method 1: s(max)=4; D=4; N=3.115577
For method 2: s(max)=1; D=1; N=0.778894
Either way, the final score is 100*N/D = 77.8894%