ElMaestro Hero Denmark, 20180227 09:02 Posting: # 18469 Views: 1,611 

Sorry about this, I am a bit backward. Once again I would like to raise the wonderful topic of LSMeans. They are a SAS invention and I am still not sure I have truly understood what the hell they really are. Or more correctly, I am very sure I have no clue whatsoever about them. Now, before someone throws a handful of web pages at me, I want to assure you, I have read them already. I am posting this because I still don't get it after all that reading e.g. here, here , and here. The recent question about baseline adjustment and covariates in this forum made me try to understand LSMeans again. So I have been sleepless since then, of course. Shouldn't have done it. But that's hindsight. For example: "When covariates are present in the model, the LSMEANS statement produces means which are adjusted for the average value of the specified covariate(s)." I have no friggin clue what that really means. Can someone, without using package lsmeans in R or automated tools, explain in slowmotion me what LSMeans really are and how exactly you would go about deriving LSMeans from scratch in a concrete dataset given a model? Feel free to use the code in the thread mentioned above as a starting point, this would ease my understanding. Muchas gracias. Edit: Category changed; see also this post #1. [Helmut] — if (3) 4 Best regards, ElMaestro "(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018. 
d_labes Hero Berlin, Germany, 20180228 09:49 @ ElMaestro Posting: # 18479 Views: 1,383 

Oh my dear! Understanding LSmeans! Abandon all hope, ye who enter here. [Dante Alighieri, The Divine Comedy: Lasciate ogni speranza, voi ch'entrate] IMHO this is only for hardcore statisticians. Not for me . Eventually this vignette of the R package emmeans , written by Russell Lenth, helps to get a slight idea.— Regards, Detlew 
ElMaestro Hero Denmark, 20180304 10:03 @ ElMaestro Posting: # 18494 Views: 1,261 

Hi all, Here's where I am so far: 1. When we deal with the standard BE model the LSMEan difference of T and R equals the treatment effect difference, regardless of contrast coding. We thus do not need to worry about LSMeans to construct the 90 or whatever % CI around the geo LSMean ratio. We just extract the treatment effect difference from the effect vectors (=the model coefficient). 2. If we introduce a covariate, however, the LSMean difference may not be the same as the treatment effect difference. This implies that the LSMean difference is not necessrily the maximum likelihood difference. This to me is a big, big worry. I consider this a huge gap in my understanding. Why would that be so, and what would be factual advantage (whether practical or theoretical) about a CI built around something which is not the maximum likelihood treatment difference? I did not extensively play around with contrast coding. Yet. Hilfe, what is going on here? I hope someone will explain and discuss and go a little beyond applying some contrasts. Many thanks. — if (3) 4 Best regards, ElMaestro "(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018. 
martin Senior Austria, 20180306 14:40 (edited by martin on 20180306 14:59) @ ElMaestro Posting: # 18498 Views: 1,123 

Dear ElMaestro, You may have a look at R package lsmeans for more details and this R code may help (credits to Alex) in understanding. library(lsmeans) 
ElMaestro Hero Denmark, 20180306 17:30 @ martin Posting: # 18499 Views: 1,464 

Dear Martin, thanks for trying to help. I am using the lsmeans package, of course. I read the documentation. From your post and the code in it I am unfortunately none the wiser. That is not your fault, of course, but it is regrettable at my end nonetheless. Could you explain the relation between my questions and the example given? It isn't so much per se how to get LSMeans that concerns me the most (use the package and get them, as simple as that, full stop). Rather what concerns me is the idea of LSMean differences not being maximum likelihood differences, and on basis of that the rationale, if any, of using the LSMean differences in stead of maximum likelihood differences (treatment effects from the model fit). — if (3) 4 Best regards, ElMaestro "(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018. 
martin Senior Austria, 20180306 19:58 @ ElMaestro Posting: # 18500 Views: 1,086 

Dear ElMaestro, Likely best to consider “lsmeans” as a predictor for a population value based on estimates from a model where the latter can be derived via maximum likelihood estimation. “Lsmeans” allow making predictions for an average value of a covariate (i.e. adjusted for the average value)  see example above where predictions (i.e. “lsmeans” in SAS speak) for strength grouped by Machine were provided condition at diameter equal to 24.133 (i.e. adjusted for the average of diameter). Prediction (i.e. lsmeans in SAS speak) can be also obtained for any a value of diameter (e.g. diameter=100) if this would make sense in a specific situation (e.g. consider a covariate modeling a time course where you are interested in predictions at weeks 1, 2, or 3 at which no observations are available) Best regards & hope this helps Martin PS.: A very nice summary is given here: https://cran.rproject.org/web/packages/doBy/vignettes/LSmeans.pdf PPS.: In balanced multiway designs or unbalanced 1way designs I would expect identical values for observed means and "lsmeans" 
ElMaestro Hero Denmark, 20180306 20:53 @ martin Posting: # 18501 Views: 1,141 

Dear Martin, it is really kind of you to try to answer but actually I am afraid I must have completely failed at explaining my question because your post addresses a question I did not ask, and it does not, I think, address the issue I hoped to get an answer for. I apologise for not being able to express myself clearly. I will try and rephrase. We get a model with factors and a covariate and we derive treatments effects (for treatment as a fixed factor). Let us say it has two levels, A and B. If we look at the difference in treatment means for A and B, that difference may be different from the difference in LSMeans for A and B. That is a potential worry to me, or a confusion, which I would like a comment on. The reason is that the model means or treatment effects are maximum likelihood estimates, so if the difference in maximum likelihood treatment estimates is not the same as the LSMean difference, then it means the LSMean difference is not a maximum likelihood difference. This is in a nutshell the cosmic mindf%cker that I am asking about. Would we ever, regardless of how LSMeans are otherwise defined (blah blah LSMEANS statement produces means which are adjusted for the average value of the specified covariate(s) blah etc etc etc) prefer them over maximum likelihood differences? One thing is 'adjusted for' and 'more relevant in case of imbalance' and what not. But that does not in itself explain why anyone would ever deviate from a conclusion based on the maximum likelihood difference regardless of imbalance or any other model phenomenon, am I wrong? The whole point of a linear model and most other models, basically, is the maximum likelihood. At least in my little world. In a nutshell: If the most likely (=maximum likelihood by way of least squares) difference of A and B is 5 and the LSMean difference is 10, why would I ever prefer the less likely difference of 10??? Or more generally why would I ever minimise sums of squares to generate maximum likelihood treatments differences that I am not using, but in stead I am minimísing sums of squares to generate another type of treatment differences that sound fancier and sexier but is less likely??? More realistic is still not equal to more likely. I am curious to hear your view on exactly and solely this aspect of LSMean difference not being equal to maximum likelihood differences. I hope it is now reworded. Again, I apologise for not being well able to express myself. Many thanks in advance. — if (3) 4 Best regards, ElMaestro "(...) targeted cancer therapies will benefit fewer than 2 percent of the cancer patients they’re aimed at. That reality is often lost on consumers, who are being fed a steady diet of winning anecdotes about miracle cures." New York Times (ed.), June 9, 2018. 
martin Senior Austria, 20180307 08:42 @ ElMaestro Posting: # 18503 Views: 997 

Dear ElMaestro, I think the statement “LSMean difference not being equal to maximum likelihood differences” and should be reworded as “LSMean difference not being equal to maximum likelihood differences in case of unbalanced experiments”. In other words, the use of “lsmeans” is motivated to account for unbalances in experiments as MLE estimation is condition on the data observed. Consider an experiment analyzed with a model of the form lm(y ~ treat + covariate) where treat is a treatment factor and covariate is an additional continuous covariate. With this model you can get estimates condition on treat and the covariate (i.e. E(Ytreat,covariate)) based on maximum likelihood estimation. However, we are frequently interested in estimates for E(Ytreat) which cannot be formally found by the model.  If the experiment is balanced, I expect that “lsmeans” estimates are identical to the average of the observations grouped by treat (i.e. “lsmeans” estimates are identical to MLE estimates).  If the experiment is unbalanced, a corresponding estimate for E(Ytreat) can be found via using the average of fitted values condition on the mean of the covariate across all levels of treat (i.e. “lsmeans” estimates are not identical to MLE estimates). Likely consider a simulation study 1) Specify a data generating process of the form: Y ~ treat + covariate + epsilon 2) Introduce a sampling strategy leading to an unbalanced sample data set 3) Get estimates (MLE and “lsmeans”) from the sample data and compare it with the true value for E(Ytreat). I would expect that the “lsmeans” estimate is closer to the true value (i.e. population value for E(Ytreat)) than model estimates obtained by maximum likelihood estimates. You can to this exercise also for a balanced experiment which should give identical estimates via “lsmeans” and MLE. Best regards & hope this helps Martin PS.: We can discuss this also offline where Mr. Schütz has my contact information 