AllExperts > Probability & Statistics 
Search      
Probability & Statistics
Volunteer
Answers to thousands of questions
 Home · More Probability & Statistics Questions · Answer Library  · Encyclopedia ·
More Probability & Statistics Answers
Question Library

Ask a question about Probability & Statistics
Volunteer
Experts of the Month
Expert Login

Awards

About Us
Tell friends
Link to Us
Disclaimer

 
 
 
 
About ronny fisher
Expertise
general questions on probability and statistics. please do not send intro prob/stat homework questions.

Experience
have taught probability and stats for 25 years

Education/Credentials
ba in math, phd in stats

Past/Present Clients
federal government, state AG, start-up pharma companies, engineering consulting firms, academic researchers (the list goes on).

 
   

You are here:  Experts > Science > Mathematics > Probability & Statistics > Analyzing a survey

Probability & Statistics - Analyzing a survey


Expert: ronny fisher - 9/20/2009

Question
QUESTION: Hello Ronny,

I have a temporary consulting job analyzing the results of a survey taken at a large organization. Total population is over 13,000. The pop is segmented by department (Legal, Management, and several Labs) and by Job (Manager, Technical Lead, Biologist, Chemist, ...)

The survey consisted of 30+ questions on a Likert scale or a rank-order.

There were 771 responses to most the questions, some had just 350 or so responses.

The Likert had five options, with the usual Very Likely, ... The middle choice was "neutral"

I have several questions, if that is ok.

(1) When analyzing the Likert responses, is it legitimate to ignore the "neutral" answer?

(2) Also, in the Likert, is it legitimate to combine the two measures on each side, i.e. Add the "Very Likely" with "Likely" to get just two answers?

Thus, for each Likert question, I'd make a column chart with two columns: one for "unlikely" and one for "likely" and compare the two.

(3) These are descriptive so far and I want to make inferences on the entire population. What type of test can I do to say that the results of any one question represent the population to a 95% confidence level?

I must do three analysis on all the questions: the population as a whole, by Department, and by Job.

(4) Aside from simply redoing the descriptive statistics, is there a way I can make inferences on the Department view and Job view?

Thanks for your help very much. I just started this new assignment and I'm desperate to do a great job so they hire me for other tasks. (whine)

best,
--dave

ANSWER: dave -

regarding your Q1: it is certainly 'legitimate' to ignore the 'neutral' responses. however, whether or not one wants do do so depends on the circumstances.

for example, if 'neutral' actually means no opinion (neither pro nor con, say), one might want to merge the neutral responses with the 'pro' responses and then compare those with the 'con' responses, to see what percentage of responders actually oppose (some proposed action, for example).

or - one could have a response pattern like: # pro = 10, # neutral = 85, # con = 5.
here, ignoring the neutral responses means throwing away 85 of the responses. moreover, a comparison of just 'pro' vs 'con' would be based on a (conditional) sample size of 15 - which cannot be expected to provide much precision.

in other circumstances, ignoring the neutral responses could make perfect sense. it depends on the particular situation.

re Q2: pooling the 'likely' and 'very likely' responses, etc and ending up with just 3 categories should present no problem.

re Q3: you are mixing up two concepts here. for testing, one works with a significance level, of .05, say. 95 usually pertains to a confidence interval (CI). which do you want?

for example, if you have # pro = 30 and # con = 20 and choose to ignore the neutral responses, the proportion of pro responses is 30 out of 50, or 60. you can turn this into a 95 (conditional) CI by computing the 95 margin of error (ME) associated with this estimate. the expression for the 95 ME is

    ME = 2sqrt{0.6 x 0.4/50} = .14.

so the 95 I for the fraction 'pro' - among those who are not neutral - is then
.6 ± .14  = (.46, .74).

or - you could choose to test the hypothesis that the actual fraction pro is .5 - using a 5 level of significance.

re Q4: i don't see offhand how you can get individual results for each department or job other than by redoing the analysis for each (department or job) separately.

ronny

---------- FOLLOW-UP ----------

QUESTION: Thanks, Ronny, very much for your help. I greatly appreciate the time you took to help me. I'm a clearer on several points.

May I ask a follow-up regarding Q3.


Yes, I think I'm mixing up two things.

What I want (I think) is, for each of the 20 or so, Likert Scale answers make a statement like:

"I am XX% confident that the responses to Question #n represents the population as a whole to an XYZ level of certainty."

If there is a a more conventional way in statistics to express this thought, then please use that convention.


In other words, how confident am I that the survey responses accurately reflect the population.

The survey appellants are an IT department, within the larger organization. The IT dept. supplies tools and expertise to the other departments. They conducted a survey to see how well they are doing.


A specific example is Question #1:
"How effectively are you supported by these tools?"

Answer Options
VE:Very Effective,
EF:Effective,
NU:Neutral,
SE:Somewhat Effective,
NE:Not Effective

ToolN,#VE, #EF, #NU, #SE, #NE
Tool_A,293,341,62,40,5
Tool_B,107,297,180,110,43
Tool_C,105,226,173,122,109
Tool_D,83,191,222,136,102
Tool_E,252,324,74,67,20
Tool_F,228,363,95,40,12

These responses are sampled from the whole population. To simplify the report (and distinguish myself from the previous analyzers), I'd like to lump the responses into two groups: Satisfied with effectiveness and Not Satisfied.

So, do I analyze each tool separately? Is there a reason to compare/contrast each tool to another?

Can I simply add up the columns to get an aggregate response to all the tools?

Thank you, again, for your generous donation of time. I'm struggling and I am indebted for your help.

Interestingly, my bosses didn't give me a specific purpose, "just analyze it better than your predecessor" was the instruction. The previous guys, just made a hokey thermometer chart of the responses with no commentary or estimation.

I think that the survey appellants, basically want to know whether they are doing a good job in the eyes of their customers.

best,
--dave

Answer
dave -

it sounds as though you do have a bit of a challenge facing you - to
"just analyze it better than your predecessor".

regarding the accuracy of the data, one cannot put things quite the way
you propose:

"I am XX% confident that the responses to Question #n represents the
population as a whole to an XYZ level of certainty."

the understanding when collecting data is that the summary outcomes
(in your case, the frequencies associated with the various responses -
such as the 293 VE responses for tool A in the question you quote)
tend to reflect the underlying population they come from - but always
with some amount of random error.

statisticians have developed various ways of quantifying this. one is by
means of confidence intervals, which i mentioned in my previous reply.
(btw - i regret that the discussion of that point may have appeared
somewhat garbled. my cc of the answer omitted the 'percentage signs'
after '95' - apparently due to a faulty transmission mechanism by the
allexperts site. so i will write p/c for percent - as in 95p/c confidence -
rather than "95%" (the quote should read "95 percent sign").

so for tool A, the fraction of VE responses is 293/741 = .395 - call it .4.
(741 includes the NU responses - they can also be omitted) - and the ME for
this estimate is 2sqrt(.4x.6/741) = .036.

then with 95p/c confidence, the true proportion of such responses in the
population is between .395 - .036 = .359 (call it .36) and
.395 + .036 = .43.

so one can report that the 95p/c CI for the true proportion of VE
responses in the population is somewhere in the interval (.36, .43).
that gives an idea of what the actual value is - together with an
indication of the uncertainty (the ME) in the sample information.

if you discard the NU responses, this will increase the estimated
fraction for VE (and the other categories as well) - and also increase
the MEs.

this is one way a statistician might summarize such data - and is about
as close as i can come to what you wrote in your message (quoted above).
i'll leave it to you to decide if this gives a satisfactory representation
of what you would like to say.

 do keep one thing in mind: if you do this for 20 likert scale responses -
(combining the VE and EF responses into one category and the SE and NE into
another, as you prefer to do) - you will have 2 or 3 proportions for each -
and you'll end up with 40 or 60 CIs. you may want to present the results
graphically - as a series of vertical lines on one graph (or more) - each
vertical line representing one CI. it is hard for anyone to eyeball even
40 numerically presented CIs - graphs work much better for this.

you also ask about comparing one tool with another. a statistician cannot
answer this question in a vacuum. you've got to ask the client what they
want. surely your boss - or others in the company can express opinions
about whether or not such analyses would be of interest. you might also
ask for their take on deleting the NU responses.

about aggregating the tools into one response: that could mask important
differences in the responses. if there were two tools, say, and one had a
70-30 break for S vs NS, while the other had a 30-70 break, the combined
'tool' would show something like a 50-50 break. is that a reasonable
representation of what the data show?

good luck with your assignment.

ronny


Add to this Answer   Ask a Question


 
User Agreement | Privacy Policy | Kids' Privacy Policy | Help
Copyright  © 2008 About, Inc. AllExperts, AllExperts.com, and About.com are registered trademarks of About, Inc. All rights reserved.