## Huge gap in my understanding [General Sta­tis­tics]

Dear ElMaestro,

I think the statement “LSMean difference not being equal to maximum likelihood differences” and should be re-worded as “LSMean difference not being equal to maximum likelihood differences in case of unbalanced experiments”.

In other words, the use of “lsmeans” is motivated to account for unbalances in experiments as MLE estimation is condition on the data observed.

Consider an experiment analyzed with a model of the form

lm(y ~ treat + covariate)

where treat is a treatment factor and covariate is an additional continuous covariate. With this model you can get estimates condition on treat and the covariate (i.e. E(Y|treat,covariate)) based on maximum likelihood estimation.

However, we are frequently interested in estimates for E(Y|treat) which cannot be formally found by the model.

- If the experiment is balanced, I expect that “lsmeans” estimates are identical to the average of the observations grouped by treat (i.e. “lsmeans” estimates are identical to MLE estimates).

- If the experiment is unbalanced, a corresponding estimate for E(Y|treat) can be found via using the average of fitted values condition on the mean of the covariate across all levels of treat (i.e. “lsmeans” estimates are not identical to MLE estimates).

Likely consider a simulation study

1) Specify a data generating process of the form: Y ~ treat + covariate + epsilon
2) Introduce a sampling strategy leading to an unbalanced sample data set
3) Get estimates (MLE and “lsmeans”) from the sample data and compare it with the true value for E(Y|treat).

I would expect that the “lsmeans” estimate is closer to the true value (i.e. population value for E(Y|treat)) than model estimates obtained by maximum likelihood estimates. You can to this exercise also for a balanced experiment which should give identical estimates via “lsmeans” and MLE.

Best regards & hope this helps

Martin

PS.: We can discuss this also offline where Mr. Schütz has my contact information