Bioequivalence and Bioavailability Forum • Least Square Means (LSM) for incomplete data

yicaoting
★

NanKing, China,
2011-10-07 20:27
(5007 d 04:25 ago)

Posting: # 7445
Views: 8,640

Least Square Means (LSM) for incomplete data [Software]

I started a query on Least Square Means (LSM) for unequal sequence several days ago.

In that posting, the method of calculating LSM for unequal sequence data has been pointed out by ElMaestro, thanks to ElMaestro. Although new question on SE for LSM of R and T arises, it is less importance for estimated 90%CI of PE. I have tried my best to calculate SE to obtain the same results as WNL or SAS, but I failed. It's beyond my ability. So, let's paused the game of SE calculation.

BTW: Before I start this question, I have already carefully learned this for several times.

Today, before I try to manually calculate LSM for incomplete data, I first calculate it with WNL and SAS, the results puzzled me again.

My dataset is Chow and Liu's famous data:
Design and Analysis of Bioavailability and Bioequivalence Studies, Third Edition, Page 71.

Dataset 1: Chow and Liu's famous data with no modification, let's call it full data
Dataset 2: Chow and Liu's famous data, delete Subject # 24's data both in period 1 (R=55.175) and period 2 (T=74.575), let's call it unbalanced data
Dataset 3: Chow and Liu's famous data, delete Subject # 24's data only in period 2(T=74.575), let's call it incomplete data

For convenience, all my analysis used original data without Ln() transformation.

For dataset 2, the results of LSM of R and T and 90%CI of (R-T) and identical between WNL and SAS:

LSM_R: 83.9525  (SE and 90%CI are different between WNL and SAS)

LSM_T: 80.6005  (SE and 90%CI are different between WNL and SAS)



R-T PE: 3.3520

90%CI PE: -3.0919 to 9.7958

For dataset 3, results from WNL:

LSM_R: 82.5594 (WNL)  vs  82.5594 (SAS)

LSM_T: 79.6926 (WNL)  vs  79.2074 (SAS)



R-T PE: 2.8668 with SE=3.7492 (WNL, both are diff from dataset 2's result)  vs   

        3.3520 with SE=3.7448 (SAS, both are same as dataset 2's result)

90%CI PE: -3.5855 to 9.3190 (WNL, diff from dataset 2's result)  vs  

          -3.0919 to 9.7958 (SAS, same as dataset 2's result)

Obviously, the results are different. So my question are:
1) which is reliable?
2) for dataset3, how to manually calc LSM_T to obtain WNL's 79.6926 or SAS's 79.2074, I tried several methods, all were failed.
3) for dataset3, how to manually obtain WNL's R-T PE's SE=3.7492?

Chow and Liu's data is:

Sub      Period     Sequence     Formulation      AUC

1          1          RT          Referenc      74.675

4          1          RT          Referenc      96.4

5          1          RT          Referenc      101.95

6          1          RT          Referenc      79.05

11         1          RT          Referenc      79.05

12         1          RT          Referenc      85.95

15         1          RT          Referenc      69.725

16         1          RT          Referenc      86.275

19         1          RT          Referenc      112.675

20         1          RT          Referenc      99.525

23         1          RT          Referenc      89.425

24         1          RT          Referenc      55.175

1          2          RT          Test          73.675

4          2          RT          Test          93.25

5          2          RT          Test          102.125

6          2          RT          Test          69.45

11         2          RT          Test          69.025

12         2          RT          Test          68.7

15         2          RT          Test          59.425

16         2          RT          Test          76.125

19         2          RT          Test          114.875

20         2          RT          Test          116.25

23         2          RT          Test          64.175

24         2          RT          Test          74.575

2          1          TR          Test          74.825

3          1          TR          Test          86.875

7          1          TR          Test          81.675

8          1          TR          Test          92.7

9          1          TR          Test          50.45

10         1          TR          Test          66.125

13         1          TR          Test          122.45

14         1          TR          Test          99.075

17         1          TR          Test          86.35

18         1          TR          Test          49.925

21         1          TR          Test          42.7

22         1          TR          Test          91.725

2          2          TR          Referenc      37.35

3          2          TR          Referenc      51.925

7          2          TR          Referenc      72.175

8          2          TR          Referenc      77.5

9          2          TR          Referenc      71.875

10         2          TR          Referenc      94.025

13         2          TR          Referenc      124.975

14         2          TR          Referenc      85.225

17         2          TR          Referenc      95.925

18         2          TR          Referenc      67.1

21         2          TR          Referenc      59.425

22         2          TR          Referenc      114.05

Thank you for your kind help. :ok:

ElMaestro
★★★

Denmark,
2011-10-07 23:00
(5007 d 01:51 ago)

@ yicaoting
Posting: # 7446
Views: 7,140

Least Square Means (LSM) for incomplete data

Post reply

Hi Yicaoting,

that's an interesting post.

❝ For dataset 3, results from WNL:

❝ LSM_R: 82.5594 (WNL) vs 82.5594 (SAS)

❝ LSM_T: 79.6926 (WNL) vs 79.2074 (SAS)

❝ Obviously, the results are different. So my question are:

❝ 1) which is reliable?

❝ 2) for dataset3, how to manually calc LSM_T to obtain WNL's 79.6926 or SAS's 79.2074, I tried several methods, all were failed.

❝ 3) for dataset3, how to manually obtain WNL's R-T PE's SE=3.7492?

Hrmmmmmmmfff... very good questions. I don't have a lot of insight.
It is common in the linear BE model to disregard all data from any subject that has a missing value. That's why SAS treats dataset2 and dataset3 equally. I get the same result in R with the function call lm.

However, there is -at least in theory- an alternative when one value is missing for a period in one (or more) subject(s) and that is to try a maximum likelihood approach where you specify subject as random in the mixed model and trt+seq+per all fixed. When I do that in R, I actually can reproduce your values from WNL (but I do not have WNL on my machine so cannot play around). It could thus be that WNL actually uses a mixed model to obtain the estimates? Someone, go read the manual?! If this is indeed the case then I am pretty sure you can't obtain easily the treatment effects (they shouldn't be called LSM's if obtained by REML). At least from a theoretical perspective one can argue that the REML-based estimates are more credible, I think. But am no expert at all.

I am still struggling with the SE's. Will get back to you if I manage to figure out something.

EM.

Helmut
★★★

Vienna, Austria,
2011-10-07 23:35
(5007 d 01:16 ago)

@ ElMaestro
Posting: # 7447
Views: 6,980

random vs. fixed

Post reply

Hi Yicaoting & ElMaestro!

❝ ❝ For dataset 3, results from WNL:

❝ ❝ LSM_R: 82.5594 (WNL) vs 82.5594 (SAS)

❝ ❝ LSM_T: 79.6926 (WNL) vs 79.2074 (SAS)

❝ Hrmmmmmmmfff...

Yessir.

❝ However, there is -at least in theory- an alternative when one value is missing for a period in one (or more) subject(s) and that is to try a maximum likelihood approach where you specify subject as random in the mixed model and trt+seq+per all fixed. When I do that in R, I actually can reproduce your values from WNL (but I do not have WNL on my machine so cannot play around). It could thus be that WNL actually uses a mixed model to obtain the estimates?

Right guess. Phoenix/WinNonlin’s default in BE is:

fixed:  Sequence+Formulation+Period

Random: Subject(Sequence)

Therefore we get REML estimates. If we delete the random effect and specify the model as ‘all fixed’
Sequence+Formulation+Period+Subject(Sequence)
WNL will spit out exactly SAS’ results (LSM_T, SE, Diff, and CI) for dataset 3.

❝ Someone, go read the manual?!

Wasn’t necessary. ;-)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-10-08 01:18
(5006 d 23:34 ago)

@ Helmut
Posting: # 7448
Views: 6,913

random vs. fixed

Post reply

Thanks HS,

❝ Right guess. Phoenix/WinNonlin’s default in BE is:

❝ fixed: Sequence+Formulation+Period

❝ Random: Subject(Sequence)

What Yicaoting then needs is a kind of formula which can render the SE of a mean difference and which allow missing values, hence an maxlikelihood-based solution.

I googled a bit around now and I cannot find it. I am sure this is straightforward given a varcovar matrix for anyone equipped with a brain marginally larger than a walnut, but because of my genetic makeup (intelligence-wise I can only compete with certain amoeba) I am just not capable of deriving it.

EM.

Helmut
★★★

Vienna, Austria,
2011-10-08 16:48
(5006 d 08:03 ago)

@ ElMaestro
Posting: # 7450
Views: 6,855

random vs. fixed

Post reply

Dear ElMaestro!

❝ […] a kind of formula which can render the SE of a mean difference and which allow missing values, hence an maxlikelihood-based solution.

❝

❝ I googled a bit around now and I cannot find it. I am sure this is straightforward given a varcovar matrix […]

Do you think that’s possible? RMLE is an iterative process. Even if you find a formula based on the variance-covariance-matrix, you would need to fire up some software to fill it up with numbers first. IMHO no ‘manual’ calculation possible here.

❝ […] intelligence-wise I can only compete with certain amoeba […]

I play in the same league than you.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-10-08 17:26
(5006 d 07:26 ago)

@ Helmut
Posting: # 7452
Views: 6,861

random vs. fixed

Post reply

Dear HS,

❝ Do you think that’s possible? RMLE is an iterative process. Even if you find a formula based on the variance-covariance-matrix, you would need to fire up some software to fill it up with numbers first. IMHO no ‘manual’ calculation possible here.

Yes, I think it is possible.
VBA works with double precision reals if the user wants (dim X as double etc) which is definitely good enough. I am rather sure that it is be possible to set up the entire thing in VBA/Excel although I would of course any day prefer a dedicated stats software. There are some practicalities around it, though:
1. VB is interpreted, not truly compiled and executed as binary. This makes it quite slow for complex operations.
2. Setting up the algos in VBA would definitely be more than just a standard day at the office. Need support from underlying matrix routines.

I made some months ago a matrix library from scratch in C and set up the algos for REML estimation and it actually worked fine. Was about 9000 lines IIRC, so it would be the same in VBA. My code was based on doubles, too, and achieved the same as R using d_labes's model specifications (see the legendary freedom thread). We need also to bear in mind that the math operations (plus, minus, log, sine etc) in themselves are never proven wrongly implemented in any VB operation; the wrong implementations come when people use the basic functions in an inappropriate manner to calculate something slightly more complex like standard deviations.

Who's got 1000 hours spare time for this little task?

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2011-10-08 18:02
(5006 d 06:50 ago)

@ ElMaestro
Posting: # 7453
Views: 6,872

random vs. fixed

Post reply

Dear ElMaestro!

❝ ❝ RMLE is an iterative process. Even if you find a formula based on the variance-covariance-matrix, you would need to fire up some software to fill it up with numbers first. IMHO no ‘manual’ calculation possible here.

❝

❝ Yes, I think it is possible.

Cough; see my emphases above. ;-)

❝ I made some months ago a matrix library from scratch in C […]. Was about 9000 lines IIRC […]

Helluva fun!

❝ see the legendary freedom thread

❝ Who's got 1000 hours spare time for this little task?

Save your energy for NLYW.
Even if you come up with a DLL, remember the Q&A:

Results obtained by alternative, validated statistical programs are also acceptable except spreadsheets because outputs of spreadsheets are not suitable for secondary assessment.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes