Bioequivalence and Bioavailability Forum • Sample size calculations: SAS vs PowerTOST/FARTSSIE

Oiinkie
☆

The Netherlands,
2011-12-06 15:30
(4893 d 02:21 ago)

Posting: # 7760
Views: 10,347

Sample size calculations: SAS vs PowerTOST/FARTSSIE [Power / Sample Size]

Dear all,

As also mentioned in other threads in the forum, sample sizes (and power) calculated with SAS are frequently different than calculated with (freely available) software, e.g. PowerTOST for R and FARTSSIE for Excel, specialized for (2x2) BE (which I definitely trust more; SAS is like a black box).

At the moment, we are having a discussion with a CRO on sample size calculations and which software to use. I have recommended PowerTOST and FARTSSIE, but they want to stick with SAS (because “they have always used it” :confused:

). This would be fine with me as long as all calculations match, but they do not.

How can I convince the CRO to also look into the possibilities of using other software besides SAS and that sample sizes calculated with SAS are not (always) the "correct" approximations? And, what to do with (how to approach) different outcomes (SAS vs others)?

Thanks in advance!

Best regards,

Oiinkie

—
Regards,

Oiinkie

Helmut
★★★

Vienna, Austria,
2011-12-06 15:49
(4893 d 02:02 ago)

@ Oiinkie
Posting: # 7761
Views: 9,350

Sample size calculations: SAS vs PowerTOST/FARTSSIE

Post reply

Dear Oiinkie!

❝ At the moment, we are having a discussion with a CRO on sample size calculations and which software to use. I have recommended PowerTOST and FARTSSIE, but they want to stick with SAS (because “they have always used it” :confused: ).

That’s the type of argument I definitely like the most. In line with “We have used bloodletting for 2000 years – can’t be wrong.” Quackery!

❝ This would be fine with me as long as all calculations match, but they do not.

Shouldn’t be such a big difference? See here and there.

❝ How can I convince the CRO to also look into the possibilities of using other software besides SAS and that sample sizes calculated with SAS are not (always) the "correct" approximations?

Well, you are the boss. :-D

❝ And, what to do with (how to approach) different outcomes (SAS vs others)?

Stick with the best. @Detlew: Is Proc Power still ‘experimental’ and/or are Owen’s functions documented in the meantime?

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Oiinkie
☆

The Netherlands,
2011-12-07 16:57
(4892 d 00:55 ago)

@ Helmut
Posting: # 7763
Views: 9,122

Sample size calculations: SAS vs PowerTOST/FARTSSIE

Post reply

Dear HS,

Thank you for your reply!

❝ That’s the type of argument I definitely like the most. In line with “We have used bloodletting for 2000 years – can’t be wrong.” Quackery!

Indeed, such an argument gives me the shivers... :vomit:

❝ Shouldn’t be such a big difference? See here and there.

As mentioned in the posts you refer to, differences should be marginal. In this case, however, differences are actually quite large in my opinion. Let me give you an example. For a two-stage crossover design both the CRO and I have calculated sample sizes with the following parameters:
alpha = 0.049
PE = 0.8969
CV = 0.1149

With both PowerTOST and FARTSSIE (and even StudySize) I come to a sample size of 16 (power=0.8474), while the CRO gives a sample size of 14 (power=0.820).

According to the file I received from the CRO, the Power procedure (Equivalence Test for Mean Ratio) was used with distribution set as lognormal and method as exact.

Can you explain these differences? Did the CRO use an incorrect procedure/syntax/option or anything? Or do you think that these differences are actually small?

❝ Well, you are the boss. :-D

Thanks in advance!

Best regards,

Oiinkie

—
Regards,

Oiinkie

d_labes
★★★

Berlin, Germany,
2011-12-12 16:38
(4887 d 01:14 ago)

@ Oiinkie
Posting: # 7772
Views: 10,644

The power not to know vs PowerTOST/FARTSSIE/...

Post reply

Dear Oiinkie, dear Helmut,

❝ ... For a two-stage crossover design both the CRO and I have calculated sample sizes with the following parameters:

❝ alpha = 0.049

❝ PE = 0.8969

❝ CV = 0.1149

❝ ...

❝ With both PowerTOST and FARTSSIE (and even StudySize) I come to a sample size of 16 (power=0.8474), while the CRO gives a sample size of 14 (power=0.820).

❝ According to the file I received from the CRO, the Power procedure (Equivalence Test for Mean Ratio) was used with distribution set as lognormal and method as exact.

Which statement (module) in Proc Power they (your CRO) really had used? twosamplemeans or pairedmeans?

Both are not correct! twosamplemeans is for a parallel group design and pairedmeans for a "paired design", i.e. a study with the paired t-test as the basic statistical method. The latter can be used for a 2x2 crossover but only if it is assumed that no period effects occur.
This is described in one of the examples (Example 67.3 Simple AB/BA Crossover Designs) in the help file of Proc Power.

Seems to me your CRO has used the first because only then I obtain (within SAS Proc Power) their figures for sample size and power. You can check this with PowerTOST if you choose design="parallel".

> sampleN.TOST(alpha=0.049,CV=0.1149,theta0=0.8969,design="parallel")



+++++++++++ Equivalence test - TOST +++++++++++

            Sample size estimation

-----------------------------------------------

Study design:  2 parallel groups 

log-transformed data (multiplicative model)



alpha = 0.049, target power = 0.8

BE margins        = 0.8 ... 1.25 

Null (true) ratio = 0.8969,  CV = 0.1149



Sample size

(n is sample size per group)

 n     power

14   0.820209

But see my emphasis!

The difference of the paired means case to the procedure taking into account the period effects lies mainly in the degrees of freedom, namely n-1 for the "paired" design compared to n-2 for the 2x2 cross-over. This will give you different results between PowerTOST (with design="2x2") and SAS Proc Power with pairedmeans. Regarding the sample size they usually will be small (around +2 subjects), but they exists and can be higher in extrem cases.

> sampleN.TOST(alpha=0.049,CV=0.1149,theta0=0.8969,design="paired")



+++++++++++ Equivalence test - TOST +++++++++++

            Sample size estimation

-----------------------------------------------

Study design:  paired values 

log-transformed data (multiplicative model)



alpha = 0.049, target power = 0.8

BE margins        = 0.8 ... 1.25 

Null (true) ratio = 0.8969,  CV = 0.1149



Sample size

 n     power

14   0.800631

Don't try this by your own. It's my un-validated extended code for PowerTOST :cool:

. All you out there: Drop me an E-mail if this is helpful for you and I will incorporate the code in the next release. Guaranteed.
The above result is identical to SAS Proc Power, statement pairedmeans.

Regarding the statement:

❝ but they want to stick with SAS (because “they have always used it)”

I can only add: There was no possibility within [image]

before version 9.1 (was rolled out around 2003 I think) for sample size estimation for a 2x2 cross-over beside the ugly Analyst application (had to be extra licensed, i.e. extra money) which was also only able to handle the paired values case and the parallel group design and used the undocumented OwenQ function.
Less than 10 years SAS experience is not "always". It takes some time longer to know the [image]

, definitely!

@Helmut: OwenQ is up to now not findable in the function dictionary of SAS 9.2 TS2M0, which I have to use (released last year if I remember correctly). They call Proc Power meanwhile production :-D

.

BTW: Oiinkie, where does your 'unusual' alpha came from? Two-stage design with nominal alpha's according to Haybittle-Peto? If yes, how do you satisfy the use of them?

—
Regards,

Detlew

Helmut
★★★

Vienna, Austria,
2011-12-12 18:05
(4886 d 23:46 ago)

@ d_labes
Posting: # 7773
Views: 9,008

Paired designs not that uncommon

Post reply

Dear Detlew!

❝ […] It's my un-validated extended code for PowerTOST :cool: . All you out there: Drop me an E-mail if this is helpful for you and I will incorporate the code in the next release. Guaranteed.

Yes, pleeze! Would be useful for studies where a multiple dose profile is compared to single dose. Haven’t seen (and would not like to see) one as a cross-over (aka logistical nightmare).

❝ OwenQ is up to now not findable in the function dictionary of SAS 9.2 TS2M0, which I have to use (released last year if I remember correctly). They call Proc Power meanwhile production :-D .

Fascinating. So it’s still some kind of “Jack-in-the-(black)-box”.

@All: Can we use an undocumented (!) function (of any software), even if we validate it against published datasets or other software?

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-12-13 18:04
(4885 d 23:47 ago)

@ Helmut
Posting: # 7776
Views: 8,919

Paired designs not that uncommon

Post reply

Dear Helmut,

❝ @All: Can we use an undocumented (!) function (of any software), even if we validate it against published datasets or other software?

From a QA perspective, "such things" (and I am just trying to speak broadly here) are often about qualification and/or validation. So, we now have another card on the table which is whether a function is documented or not.

I guess we need to find out what it implies to document, qualify and validate a function in some stats software. Once we have that in place we can discuss if a script that uses this software needs to be validated, qualified (and please let's mess as much up as possible by introducing flavours of IQ, OQ, PQ) or just documented and how.
Anyone know of a regulator who knows the difference between a function and a script?

"Aaaah, they use SAS. SAS is validated. This dossier will get my nod."

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2011-12-13 19:10
(4885 d 22:41 ago)

@ ElMaestro
Posting: # 7777
Views: 9,056

Off topic: software validation

Post reply

Hi ElMaestro!

❝ From a QA perspective, "such things" (and I am just trying to speak broadly here) are often about qualification and/or validation. So, we now have another card on the table which is whether a function is documented or not.

I think the order (from the user’s perspective) is: documented ⇒ qualified ⇒ validated (and hopefully valid).
If we don’t have any documentation on how to use a SW’s function, how could we test it?
BTW, @Detlew: How did you discover the existence of Owen’s Q in SAS?

❝ I guess we need to find out what it implies to document, qualify and validate a function in some stats software.

A good starter is the PIC/S Guidance. But where do we end? The core routines of SAS and WinNonlin were written in the mid-1960s. I wouldn’t bet that all of them have been ‘touched’ ever since. Even fancy R uses compiled FORTRAN-libraries (of similar age?). Remember the noncentral t (based on a C translation of algo AS243 from 1989) mentioned in this post or – even worse – that one.

❝ Once we have that in place we can discuss if a script that uses this software needs to be validated, qualified (and please let's mess as much up as possible by introducing flavours of IQ, OQ, PQ) or just documented and how.

Yes, how much is possible and what are the efforts? Example: Phoenix’s validation suite costs more than the software itself. If you test all components (NCA, BE, IVIVC, PK, NLME/PopPK,…) estimated runtime is about 40 hours. ‘Validation’ should be done everytime to OS is altered (not only service packs). This would include Hotfixes/Patches which are distributed by M$ monthly. Your productivity would go down by 25%. In a larger company hotfixes are centrally distributed, which means that you can’t even retreat to another machine in the meantime. Anyway I would call everything you obtain from a SW vendor (how extensive it every might be) not validation but qualification. It’s up to us to find out the flaws. :angry:

❝ Anyone know of a regulator who knows the difference between a function and a script?

Cough.

(at least one)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2011-12-13 21:30
(4885 d 20:21 ago)

@ Helmut
Posting: # 7778
Views: 9,048

Off topic: software validation

Post reply

Hi again,

❝ Yes, how much is possible and what are the efforts? Example: Phoenix’s validation suite costs more than the software itself. If you test all components (NCA, BE, IVIVC, PK, NLME/PopPK,…) estimated runtime is about 40 hours. ‘Validation’ should be done everytime to OS is altered (not only service packs). This would include Hotfixes/Patches which are distributed by M$ monthly. Your productivity would go down by 25%. In a larger company hotfixes are centrally distributed, which means that you can’t even retreat to another machine in the meantime. Anyway I would call everything you obtain from a SW vendor (how extensive it every might be) not validation but qualification. It’s up to us to find out the flaws. :angry:

I once attended a course on WNL modeling near Paris where 20 computers were available to the attendees. WNL was installed but malfunctioned on each and every one of them. The course director was rather perplexed. It turned out to be a conflict between some hardware driver and WNL that was the problem. Which meant that the conflicting hardware had to be deactivated on all units (was not essential, iirc it was a token ring i/o driver. Yup back in those days!). So to rule out unpleasant regulatory interrogations I guess all validations should be done in presence and in absence of any other piece of software or running exe/driver on the target machine, plus all their possible interactions.

"But you didn't validate when Minesweeper, MS Excel and Adobe Suite was running and while your printer driver and RSS reader was turned off. I'll give you a 483, sir."

❝ Cough. :lookaround: (at least one)

Still active?

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2011-12-14 15:14
(4885 d 02:37 ago)

@ ElMaestro
Posting: # 7780
Views: 8,917

Off topic: software validation

Post reply

Dear ElMaestro!

❝ "But you didn't validate when Minesweeper, MS Excel and Adobe Suite was running and while your printer driver and RSS reader was turned off. I'll give you a 483, sir."

I do know people who think that if they have a box of matchsticks, at least the tips are in the wrong end of the box…
If I recall it right infamous ‘Excalibur’ (= chromatography software) was only certified by Thermo Finnigan on XP Pro together with Office 2k in en-US localization. Led to nice ‘island PCs’ in a lot of labs and analysts throwing their keyboards out of the window when they hit the wrong key another time (WYSINWYG).

❝ ❝ Cough. :lookaround: (at least one)

❝

❝ Still active?

Yes.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

d_labes
★★★

Berlin, Germany,
2011-12-14 16:34
(4885 d 01:18 ago)

@ Helmut
Posting: # 7781
Views: 9,548

Discovery of Owen's Q in SAS

Post reply

Hi Helmut!

❝ BTW, @Detlew: How did you discover the existence of Owen’s Q in SAS?

Mentioned above the Analyst application in SAS before V9.1 allowed the sample size estimation for the paired design and the parallel group design. Within that application (designed as some sort of "point and click") the user had the possibility to look behind the scenes and get the code the [image]

really used.

And oh wonder there where code lines like

   ...

   df=n-1

   t1=tinv(1-alpha,df);

   P1=OwenQ(t1,d1,0,R,df);

   P2=OwenQ(-t1,d2,0,R,df);

   power=P2-P1;

   ...

Remembering having heard the name Owen sometimes ago, getting knowledge of the formulas behind that AlGore-Rhythms it was not too hard -but costs nevertheless some sweat- for me amateur to figure out what the arguments of the OwensQ where :cool:

.

Regarding your validation question:
I would say in principle it could be validated what we do with such undocumented functions, features or whatever if we could show the results are what is expected.

I have seen all around in the software industry usage of undocumented functions, features, data structures, especially in using features of the Betriebssystem Windoofs. Maybe the well-known stability of Progs under Kleinweich (measured as blue screen probability) was partly due to that.

The problem with such undocumented features/functions is that they may change in future versions, maintenance releases or even in hot-/bug-fixes without any notice. In the best case our use of them will then throw errors (or blue screens). But according to McMurphy's laws I expect we get wrong results without noticing them and are surprised :surprised:

if others do.

I had decided to use OwenQ because at that time I had no other opportunity to do my sample size planning the right way within "The validated power ...". Thus the alternatives were "use or die".
Meanwhile R and the author of PowerTOST came to rescue :-D

.
Now Owen's Q function is documented, the implementation code can be inspected by every body, but is it validated? What does this mean?
Of course the author of PowerTOST has checked the results of using it in sample size estimation as far as possible (regarding time, money and other resources like literature tables, access to other power/sample size software and and ...). Really! Nevertheless during the life cycle of PowerTOST up to now cases arose (extremal with respect to the usual use, see NEWS file of PowerTOST) where the implementation lead to implausible or erroneous results. What does this mean with respect to validation? Does this mean we have to validate every of our application cases because we don't know if this is an extremal? If yes how? Extremal cases are usually not tabulated. Other software may use other algorithms/approximations which by themselves call for validation/qualification. Hier beisst sich die Katze in den Schwanz. Two pices of software each giving 2+2=5 (a well known result) validate one another.

I meanwhile think the whole software validation fairy tale is a Chimera invented by the Greeks having nothing to do (the Greeks are nowadays responsible for all) :cool:

.

Validated - Has been tried once before.
Guernsey McPearson's Drug Development Dictionary

—
Regards,

Detlew

Helmut
★★★

Vienna, Austria,
2011-12-19 21:05
(4879 d 20:46 ago)

@ d_labes
Posting: # 7794
Views: 8,766

Paired designs in PowerTOST 0.9-0

Post reply

Dear Detlew!

❝ Drop me an E-mail if this is helpful for you and I will incorporate the code in the next release. Guaranteed.

THX for including it already in PowerTOST 0.9-0 (2011-12-15)!
It’s nice that you included more examples in the help files and also ‘expected’ results. Could reproduce all of them with my installation. :cool:

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes