maybe you can help me with the following data. I´m not sure how to analyze them. Here is the context:
I have 4 different mutant lines of Arabidopsis in a petri plate and want to test the effect of 4 treatments (4 different concentrations of salt in the media) on them.
Each plate will be treated with one of the treatments. And I have 4 plates for each treatment (i.e. 4 repetitions of that treatment)
I am scoring the appearance of the seedlings in the plate after treatment, with two levels for this variable: green/not green
and I am doing the score at 5, 6, 7 and 10 days after treatment.
Are the following thoughts right?
I think the factors are the treatment, the plate and the line. Is the time point of scoring a covarian or a further factor? I think rather the later.
The variable is a categorical (here nominal) one: green/not green
Regarding the experimental design with scorings at different time points:
do I have repeted measurements or related measurements? Because actually I have an accumulating variable since
the "not green" cotyledons remain "not green" and they account to the next scoring date.
So I am not sure if I should do a GLM or something else (maybe survival analysis?). I am using SPSS.
I thank you SO MUCH! in advance for your time.
Answer mara -
i have an additional reference for you that may be of help.
there is a book called 'Survival Analysis - a Self-Learning Text', 2nd Ed,
by D. Kleinbaum and M. Klein, published by Springer, that discusses data
such as yours [interval censored]. on pp 286-294 [see esp. pp 289-294],
they show how to use logistic regression to analyze such data by
incorporating certain dummy variables into the analysis.
i can send you a pdf copy of these pages, but unfortunately, the allexperts
website does not allow us to send pdf attachments. if you want me to send it,
please provide an email address.
ronny
PREVIOUS REPLY COPIED BELOW
-----------------------------------------------------------------------------------------------
one way to think of your experiment is that it yields survival [or failure time] data with covariates, where the time scale for the 'lifetimes' - the times by which a 'green' response is observed - is discretized or grouped and also right censored. the covariates are the factors you mention. if a particular seedling is not green by the 10th day, its time is right censored [at 10].
in this view, the times seedlings are observed are not levels of another factor because the same seedling is observed
at the different times. usually factorial experiments involve independent responses at the different combinations of
factor levels - which would not be the case here.
there is a way to introduce correlations among observations in factorial experiments - by introducing random effects - for the seedlings in this case - treating them as another factor. so then each seedling is one level of the factor
'seedling'. also treating time as a factor, you would then have correlated binary responses at the different time levels for a given seedling.
usually one uses a log-linear or logistic model for binary responses - where a random effect for seedling would introduce correlations among the binary responses for a given seedling. but that would not capture the monotone nature of the binary responses here: once a seedling is green, it remains green.
you can think of the binary responses for a seedling as a vector such as (0,0,1,1) - which would represent a seedling that was green on day 7 but not on days 5 or 6. there are only 6 possible response vectors here, since any coordinate to the right of a '1' response must also be a '1'. with ordinary [non-monotone] correlated binary responses, there would be
16 possible response vectors - since each of the 4 coordinates could be a 0 or a 1, regardless of what values the other coordinates have.
so i will take the view that you have lifetime data as mentioned above. there are then [at least] two ways to analyze such data:
one way is to assuming some sort of parametric model for the 'lifetimes', such as an exponential or weibull distribution, letting one or more parameters of the distribution depend on a linear combination of the covariates. this leads to a
so-called generalized linear model.
another way is to use a cox proportional hazards model for the lifetimes, which also involves a parameter that depends on a linear combination of the covariates.
there is a very informative book on the subject of modeling lifetime data - by kalbfleisch and prentice:
"the statistical analysis of failure time data", 2nd edition, wiley [2002]. it is fairly up to date and discusses discrete or discretized failure times as well as the [more common] case of continuous failure times. among other things, it discusses software that can be used to analyze such data [pp 402-403], including spss. from what they say, spss cannot be used to do parametric models, but it does do proportional hazards [cox] models. [sas can do either kind of analysis.] they also cite other references that discuss such software packages in more detail.
your experiment sounds very interesting from a statistical perspective. good luck with it.