ElMaestro
★★★

Denmark,
2018-02-27 10:02
(2221 d 05:41 ago)

Posting: # 18469
Views: 5,695
 

 LSMeans [General Sta­tis­tics]

Sorry about this,

I am a bit backward.

Once again I would like to raise the wonderful topic of LSMeans.
They are a SAS invention and I am still not sure I have truly understood what the hell they really are. Or more correctly, I am very sure I have no clue whatsoever about them.
Now, before someone throws a handful of web pages at me, I want to assure you, I have read them already. I am posting this because I still don't get it after all that reading e.g. here, here , and here.

The recent question about baseline adjustment and covariates in this forum made me try to understand LSMeans again. So I have been sleepless since then, of course. Shouldn't have done it. But that's hindsight.

For example: "When covariates are present in the model, the LSMEANS statement produces means which are adjusted for the average value of the specified covariate(s)."
I have no friggin clue what that really means.

Can someone, without using package lsmeans in R or automated tools, explain in slowmotion me what LSMeans really are and how exactly you would go about deriving LSMeans from scratch in a concrete dataset given a model? Feel free to use the code in the thread mentioned above as a starting point, this would ease my understanding.

Muchas gracias.


Edit: Category changed; see also this post #1. [Helmut]

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2018-02-28 10:49
(2220 d 04:54 ago)

@ ElMaestro
Posting: # 18479
Views: 4,972
 

 Understanding (!) LSMeans

Oh my dear!

Understanding LSmeans!
Abandon all hope, ye who enter here.
[Dante Alighieri, The Divine Comedy: Lasciate ogni speranza, voi ch'entrate]
IMHO this is only for hardcore statisticians. Not for me :blind:.

Eventually this vignette of the R package emmeans, written by Russell Lenth, helps to get a slight idea.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2018-03-04 11:03
(2216 d 04:40 ago)

@ ElMaestro
Posting: # 18494
Views: 4,926
 

 Huge gap in my understanding

Hi all,

Here's where I am so far:
  1. When we deal with the standard BE model the LSMEan difference of T and R equals the treatment effect difference, regardless of contrast coding. We thus do not need to worry about LSMeans to construct the 90 or whatever % CI around the geo LSMean ratio. We just extract the treatment effect difference from the effect vectors (=the model coefficient).
  2. If we introduce a covariate, however, the LSMean difference may not be the same as the treatment effect difference. This implies that the LSMean difference is not necessrily the maximum likelihood difference. This to me is a big, big worry. I consider this a huge gap in my understanding. Why would that be so, and what would be factual advantage (whether practical or theoretical) about a CI built around something which is not the maximum likelihood treatment difference? I did not extensively play around with contrast coding. Yet.

Hilfe, what is going on here? I hope someone will explain and discuss and go a little beyond applying some contrasts.
Many thanks.

Pass or fail!
ElMaestro
martin
★★  

Austria,
2018-03-06 15:40
(2214 d 00:03 ago)

@ ElMaestro
Posting: # 18498
Views: 4,760
 

 Huge gap in my understanding

Dear ElMaestro,

You may have a look at R package lsmeans for more details and this R code may help (credits to Alex) in understanding.

library(lsmeans)

### Covariance example (from Montgomery Design (8th ed.), p.656)
print(fiber)

### model
fiber.lm <- lm(strength ~ diameter + machine, data = fiber)
summary(fiber.lm)

# means versus ls-means
fiber$pred <- predict(fiber.lm, list(machine=fiber$machine, diameter=rep(mean(fiber$diameter), 15)))
aggregate(fiber$strength, by=list(fiber$machine), mean) # mean
aggregate(fiber$pred, by=list(fiber$machine), mean)     # lsmean

# lsmeans via R package lsmeans
fiber.lsm <- lsmeans(fiber.lm, "machine")
fiber.lsm
ElMaestro
★★★

Denmark,
2018-03-06 18:30
(2213 d 21:14 ago)

@ martin
Posting: # 18499
Views: 5,102
 

 Huge gap in my understanding

Dear Martin,

thanks for trying to help.
I am using the lsmeans package, of course. I read the documentation.

From your post and the code in it I am unfortunately none the wiser. That is not your fault, of course, but it is regrettable at my end nonetheless. :-):-)

Could you explain the relation between my questions and the example given?
It isn't so much per se how to get LSMeans that concerns me the most (use the package and get them, as simple as that, full stop).
Rather what concerns me is the idea of LSMean differences not being maximum likelihood differences, and on basis of that the rationale, if any, of using the LSMean differences in stead of maximum likelihood differences (treatment effects from the model fit).

Pass or fail!
ElMaestro
martin
★★  

Austria,
2018-03-06 20:58
(2213 d 18:46 ago)

@ ElMaestro
Posting: # 18500
Views: 4,641
 

 Huge gap in my understanding

Dear ElMaestro,

Likely best to consider “lsmeans” as a predictor for a population value based on estimates from a model where the latter can be derived via maximum likelihood estimation.

“Lsmeans” allow making predictions for an average value of a co-variate (i.e. adjusted for the average value)
- see example above where predictions (i.e. “lsmeans” in SAS speak) for strength grouped by Machine were provided condition at diameter equal to 24.133 (i.e. adjusted for the average of diameter).

Prediction (i.e. lsmeans in SAS speak) can be also obtained for any a value of diameter (e.g. diameter=100) if this would make sense in a specific situation (e.g. consider a co-variate modeling a time course where you are interested in predictions at weeks 1, 2, or 3 at which no observations are available)

Best regards & hope this helps

Martin

PS.: A very nice summary is given here: https://cran.r-project.org/web/packages/doBy/vignettes/LSmeans.pdf

PPS.: In balanced multi-way designs or unbalanced 1-way designs I would expect identical values for observed means and "lsmeans"
ElMaestro
★★★

Denmark,
2018-03-06 21:53
(2213 d 17:51 ago)

@ martin
Posting: # 18501
Views: 4,760
 

 Huge gap in my understanding

Dear Martin,

it is really kind of you to try to answer but actually I am afraid I must have completely failed at explaining my question because your post addresses a question I did not ask, and it does not, I think, address the issue I hoped to get an answer for. I apologise for not being able to express myself clearly.
I will try and rephrase.

We get a model with factors and a covariate and we derive treatments effects (for treatment as a fixed factor). Let us say it has two levels, A and B.
If we look at the difference in treatment means for A and B, that difference may be different from the difference in LSMeans for A and B. That is a potential worry to me, or a confusion, which I would like a comment on. The reason is that the model means or treatment effects are maximum likelihood estimates, so if the difference in maximum likelihood treatment estimates is not the same as the LSMean difference, then it means the LSMean difference is not a maximum likelihood difference. This is in a nutshell the cosmic mindf%cker that I am asking about.
Would we ever, regardless of how LSMeans are otherwise defined (blah blah LSMEANS statement produces means which are adjusted for the average value of the specified covariate(s) blah etc etc etc) prefer them over maximum likelihood differences?
One thing is 'adjusted for' and 'more relevant in case of imbalance' and what not. But that does not in itself explain why anyone would ever deviate from a conclusion based on the maximum likelihood difference regardless of imbalance or any other model phenomenon, am I wrong? The whole point of a linear model and most other models, basically, is the maximum likelihood. At least in my little world.
In a nutshell: If the most likely (=maximum likelihood by way of least squares) difference of A and B is 5 and the LSMean difference is 10, why would I ever prefer the less likely difference of 10??? Or more generally why would I ever minimise sums of squares to generate maximum likelihood treatments differences that I am not using, but in stead I am minimísing sums of squares to generate another type of treatment differences that sound fancier and sexier but is less likely??? More realistic is still not equal to more likely. :confused:

I am curious to hear your view on exactly and solely this aspect of LSMean difference not being equal to maximum likelihood differences. I hope it is now reworded. Again, I apologise for not being well able to express myself. Many thanks in advance.:-)

Pass or fail!
ElMaestro
martin
★★  

Austria,
2018-03-07 09:42
(2213 d 06:01 ago)

@ ElMaestro
Posting: # 18503
Views: 4,624
 

 Huge gap in my understanding

Dear ElMaestro,

I think the statement “LSMean difference not being equal to maximum likelihood differences” and should be re-worded as “LSMean difference not being equal to maximum likelihood differences in case of unbalanced experiments”.

In other words, the use of “lsmeans” is motivated to account for unbalances in experiments as MLE estimation is condition on the data observed.

Consider an experiment analyzed with a model of the form

lm(y ~ treat + covariate)

where treat is a treatment factor and covariate is an additional continuous covariate. With this model you can get estimates condition on treat and the covariate (i.e. E(Y|treat,covariate)) based on maximum likelihood estimation.

However, we are frequently interested in estimates for E(Y|treat) which cannot be formally found by the model.

- If the experiment is balanced, I expect that “lsmeans” estimates are identical to the average of the observations grouped by treat (i.e. “lsmeans” estimates are identical to MLE estimates).

- If the experiment is unbalanced, a corresponding estimate for E(Y|treat) can be found via using the average of fitted values condition on the mean of the covariate across all levels of treat (i.e. “lsmeans” estimates are not identical to MLE estimates).

Likely consider a simulation study

1) Specify a data generating process of the form: Y ~ treat + covariate + epsilon
2) Introduce a sampling strategy leading to an unbalanced sample data set
3) Get estimates (MLE and “lsmeans”) from the sample data and compare it with the true value for E(Y|treat).

I would expect that the “lsmeans” estimate is closer to the true value (i.e. population value for E(Y|treat)) than model estimates obtained by maximum likelihood estimates. You can to this exercise also for a balanced experiment which should give identical estimates via “lsmeans” and MLE.

Best regards & hope this helps

Martin

PS.: We can discuss this also offline where Mr. Schütz has my contact information
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,638 registered users;
73 visitors (0 registered, 73 guests [including 4 identified bots]).
Forum time: 15:44 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5