Helmut
★★★

Vienna, Austria,
2020-05-15 12:31
(523 d 11:34 ago)

Posting: # 21440
Views: 4,226

## Problems with PASS [Software]

Dear all,

a colleague received a sample size estimation of an CRO performed in PASS 15.0.5 for a fully replicated study (TRRT|RTTR), CV 0.50, T/R 0.95, power 0.80, ABE (unscaled). The result was N=54 (power 0.8053).
Since he is a fan of PowerTOST he tried

library(PowerTOST) sampleN.TOST(CV = 0.5, theta0 = 0.95, targetpower = 0.80, design = "2x2x4")

and got

+++++++++++ Equivalence test - TOST +++++++++++             Sample size estimation ----------------------------------------------- Study design: 2x2x4 (4 period full replicate) log-transformed data (multiplicative model) alpha = 0.05, target power = 0.8 BE margins = 0.8 ... 1.25 True ratio = 0.95,  CV = 0.5 Sample size (total)  n     power 50   0.812806

Lower sample than with PASS (and higher power; with one dropout it will be still >0.80).
He suspected that PASS does not use the exact method (default in most function of PowerTOST) but one of the approximations and tried the noncentral t (method = "nct") as well as the shifted central t (method = "shifted"):

 sampleN.TOST(CV = 0.5, theta0 = 0.95, targetpower = 0.80, design = "2x2x4",              method = "nct", print = FALSE, details = FALSE)[7:8]   Sample size Achieved power 1          50      0.8128063 sampleN.TOST(CV = 0.5, theta0 = 0.95, targetpower = 0.80, design = "2x2x4",              method = "shifted", print = FALSE, details = FALSE)[7:8]   Sample size Achieved power 1          50      0.8120118

Then he was worried and sent me an email together with the output of PASS…

We know that in a 2×2×4 design power is approximately equal to a 2×2×2 design with ½ of its sample size because the number of treatments is the same and the differing degrees of freedom play a lesser role. In this case: 98 / 2 = 49 → 50. This approach is used in package bear.
Since I’m not aware of reference tables for replicate design evaluated for ABE, I tried simulations (see this post for the code) and got for simulating statistics

  Sample size Achieved power 1          50        0.81228

and for simulating subjects

  Sample size Achieved power 1          50        0.81231

What the heck? The output of PASS gives a list of references (I numbered the list):
1. Chow, S.C. and Liu, J.P. 1999. Design and Analysis of Bioavailability and Bioequivalence Studies. Marcel Dekker. New York
2. Chow, S.C.; Shao, J.; Wang, H. 2003. Sample Size Calculations in Clinical Research. Marcel Dekker. New York.
3. Chen, K.W.; Chow, S.C.; and Li, G. 1997. 'A Note on Sample Size Determination for Bioequivalence Studies with Higher-Order Crossover Designs.' Journal of Pharmacokinetics and Biopharmaceutics, Volume 25, No. 6, pages 753-765.
#2 is known for many typos; hence, I ignored it.
#3 contains sample size tables and therefore, was a good candidate. Surprise: With increasing CV sample sizes were – generally – larger than expected. Unfortunately the tables don’t go beyond 40%. However, in Table VII 38 subjects are given, whereas I got 34. The underlying ABE-model is not specified; the authors refer to #1. OK, Chapter 9 is it. Gotcha, carry­over in the model! Stephen Senn devoted a good part of his book about crossover studies arguing against it. Not only that carryover is scientifically questionable, none of the guidelines recommend such models. BTW, #1 contains also tables where the sample sizes are (consequently) too large.

Given all that, I recommend to
• neither use PASS (at least for ABE in replicate designs)
• nor the sample size tables in #1 and #3.
I will download the trial version of PASS2020 to assess it further.*

PS: If you are with a CRO you might be tempted to sell the sponsor large studies. That might backfire like in this case where to sponsor knows PowerTOST

• v20.0.1
Splendid.
Sample size (power) for ABE {0.8000|1.2500}, CV 0.50, ratio 0.95, target power 0.80, α 0.15 (!)
Seems that in PASS for the replicate designs the shifted central t-distribution is implemented and for the 2×2×2 the noncentral t (closest match of power). Nothing given in the manual.
TRRT|TRRT
PASS          : 32 (0.8003)
sampleN.TOST(): 30 (0.8100)
TT|RR|TR|RT
PASS          : 232 (0.8019)
sampleN.TOST(): 232 (0.8020)
TTRR|RRTT|TRRT|RTTR
PASS          : 32 (0.8301)
sampleN.TOST(): 32 (0.8302)
TR|RT
PASS          : 58 (0.8005)
sampleN.TOST(): 58 (0.8005)

Even if we consider the crude relationship of the 2-sequence full replicate to the 2×2×2: 58 / 2 = 29 → 30 < 32.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-05-15 13:01
(523 d 11:03 ago)

@ Helmut
Posting: # 21441
Views: 3,345

## Problems with PASS

Hi Hötzi,

I could be wrong but...

PASS may be assuming this is a case of two sample t-test-like-scenario, where it does not take into consideration that the actual model causes additional reduction of df's.

You may be able to approximately reproduce PASS' result if you can override the df's in PowerTOST. I would still anyday say PowerTOST is more right than PASS, of course.

Pass or fail!
ElMaestro
Helmut
★★★

Vienna, Austria,
2020-05-15 13:42
(523 d 10:23 ago)

@ ElMaestro
Posting: # 21442
Views: 3,341

## Problems with PASS

Hi ElMaestro,

» I could be wrong but...

We all may be but…

» PASS may be assuming this is a case of two sample t-test-like-scenario,…

It does (according to the manual and given in the output).

» … where it does not take into consideration that the actual model causes additional reduction of df's.

Yep, that’s the point! It gives me an answer to a question I did not ask (a model nobody uses for ages). Since I played around with the trial version, I can now say that it is a black 📦.
At least for the replicate designs the carryover model is hardcoded, not specified in the manual, and there is no way to change that.
I just sent an email to NCSS’ support asking for a clarification.

» You may be able to approximately reproduce PASS' result if you can override the df's in PowerTOST.

Maybe.

» I would still anyday say PowerTOST is more right than PASS, of course.

I like this one:

Of course, PASS passes IQ and OQ. Remember what you once wrote?

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-05-15 19:02
(523 d 05:03 ago)

@ Helmut
Posting: # 21443
Views: 3,267

## Problems with PASS

Hi again,

» Yep, that’s the point!

Let's face it: ElMaestro is always right. Except when he is wrong.
You better get used to this very basic law of science.

» Remember what you once wrote?

Hehe, it is a bit vague in my memory. Prions and loads of paint thinner have taken their toll on my brain.
It must be inspired by something someone said, but I don't recall well. Perhaps from something I read in a Borland Delphi manual or something?! That would be the right place for a garbage discussion anyway.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2020-05-15 19:45
(523 d 04:20 ago)

@ Helmut
Posting: # 21444
Views: 3,270

## Problems with PASS

Dear Helmut, dear ElMaestro,

» » I could be wrong but...
»
» We all may be but…

Me too. Especially me .

Just my 2 cents.
Differences between PASS and PowerTOST w.r.t replicate cross-over designs
• PASS uses degrees of freedom from an cross-over ANOVA having a carry-over term
PowerTOST uses only the usual terms tmt, period, sequence and subject
• The design constant for the 2x2x4 design is 1.1 instead of 1.0 in PowerTOST
• The power is calculated approximately in PASS via the shifted central t-distribution
• For log-transformed data the approximation CV ~ se is used in PASS (!).
se is the standard error of the residuals.

Especially the last feature is due to the differences in the sample sizes got from PASS compared to. My recommendations supporting Helmut's recommendations above:
Don't use the sample sizes for replicate designs obtained by PASS because
• They rely on statistics not used in the evaluation of the study
• They rely on crude approximations, especially on CV ~ se which may work for small CVs but is beyond repair for higher values

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2020-05-15 23:20
(523 d 00:45 ago)

@ d_labes
Posting: # 21445
Views: 3,283

## Problems with PASS

Dear Detlew,

your first three points explain why sample sizes are generally larger. The last why differences increase with the CV.

   CV      SE  0.05 0.05003  0.10 0.10025  0.20 0.20202  0.30 0.30688  0.40 0.41655  0.50 0.53294  0.60 0.65828  0.70 0.79518  0.80 0.94683  0.90 1.11710  1.00 1.31083

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★

Vienna, Austria,
2020-05-17 14:34
(521 d 09:31 ago)

@ Helmut
Posting: # 21447
Views: 3,156

## PASS <2000?

Dear all,

due to Detlew’s detective work (THX!) I see it clearer now. Seems that it was a bug in earlier versions.

“Equivalence Tests for the Ratio of Two Means in a Higher-Order Cross-Over Design (Log-Normal Data)” were added in PASS v14. Although nothing is stated about an improvement/update in later versions, according to the online manual (identical to the one which came with the PASS 2020 Trial) I could reproduce the examples with the internal function power.PASS() of PowerTOST. Code upon request.
All examples for ABE {0.8000|1.2500}, α 0.05.

Example 1 – Finding Power: ABB|BAA (Manual 545-9)  CV  N reported shifted    nct  exact PT.shifted PT.nct PT.exact 0.4 10   0.0000  0.0000 0.0000 0.0299     0.0000 0.0000   0.0285 0.4 20   0.3051  0.3051 0.3111 0.3120     0.3060 0.3118   0.3126 0.4 30   0.5858  0.5858 0.5887 0.5887     0.5861 0.5889   0.5889 0.4 40   0.7483  0.7483 0.7501 0.7501     0.7484 0.7503   0.7503 0.4 60   0.9035  0.9035 0.9045 0.9045     0.9035 0.9045   0.9045 0.4 80   0.9627  0.9627 0.9633 0.9633     0.9627 0.9634   0.9634
The default in power.PASS() is the approximation by the shifted central t-distribution (method = "shifted"), although the noncentral t (method = "nct") and the exact method by Owen’s Q (method = "exact") are implemented as well. Columns starting with PT give results obtained by power.TOST().
Confirmed that PASS uses the shifted t, which I solely used in the other examples. Good agreement with power.TOST().

Example 2 – Finding Sample Size: ABB|BAA (Manual 545-11)  CV target  Power reported N1   pwr1 N2   pwr2 PT.N.shifted PT.pwr.shifted PT.N.exact PT.pwr.exact 0.4    0.8 0.8026       45 45 0.8024 46 0.8119           46         0.8119         46       0.8134 0.4    0.9 0.9035       60 60 0.9035 60 0.9035           60         0.9035         60       0.9045
Good agreement (N1) though in practice one would round up to N2 in order to get balanced sequences like in all sample size-functions of PowerTOST.

Example 3 – Validation using Chen et al. (1997): AA|BB|AB|BA (Manual 545-12)     SE  CV target  Power reported N1   pwr1 N2   pwr2 PT.N.shifted PT.pwr.shifted PT.N.exact PT.pwr.exact 0.1003 0.1    0.8 0.8106       16 16 0.8106 16 0.8106           16         0.8151         16       0.8239 0.1003 0.1    0.9 0.9085       20 20 0.9085 20 0.9085           20         0.9104         20       0.9192
Good agreement again. Note that in order to reproduce the results of Chen et al. – despite CV is stated in the paper – we have to work with the standard error of residuals. Here it is with 0.1002505 close to the CV of 0.1 but see also there.

Now the troublesome one of the OP.
Example 4; PASS 15.05.5: ABBA|BAAB (SE instead of CV)     SE  CV target  Power reported N1   pwr1 N2   pwr2 PT.N.shifted PT.pwr.shifted PT.N.exact PT.pwr.exact 0.5329 0.5    0.8 0.8053       54 55 0.8053 56 0.8123           50          0.812         50       0.8128
I could not reproduce it exactly (the different design constants and dfs due to carry-over cut also in) but it explains what is going on in this earlier version of PASS and the discrepancy to sampleN.TOST().

Now what we can expect* in PASS 2000 2020 (and perhaps in a version >15):
Example 5 = 4; PASS 2000 (use CV)  CV target N1  pwr1 N2  pwr2 PT.N.shifted PT.pwr.shifted PT.N.exact PT.pwr.exact 0.5    0.8 49 0.804 50 0.812           50          0.812         50       0.8128
Seemingly OK.

Conclusion: If you use PASS, update to v2000 v2020. If you are a sponsor receiving a sample size estimation in an earlier version, demand an update (or use PowerTOST ).

• Expect, yes. Will you get these values? No.
Still not corrected in PASS2020.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2020-05-17 17:01
(521 d 07:04 ago)

@ Helmut
Posting: # 21448
Views: 3,150

## PASS 2020

Really PASS 2000 or PASS 2020?

Regards,

Detlew
Helmut
★★★

Vienna, Austria,
2020-05-17 17:34
(521 d 06:31 ago)

@ d_labes
Posting: # 21449
Views: 3,143

## PASS 2020!

Dear Detlew,

» Really PASS 2000 or PASS 2020?

F**k! 2020, of course. Freudian? IIRC, I once had Chris Hintze’s “NCSS 2000”.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Helmut
★★★

Vienna, Austria,
2020-05-21 18:43
(517 d 05:22 ago)

@ Helmut
Posting: # 21456
Views: 2,881

## PASS 2020: Outcome

Dear all,

in the following my observations/conclusion about PASS2020 v20.0.1 (released 2020-02-10). I checked only the sample size procedures relevant for ABE. CV 0.1–0.4 (Δ 0.02), 0.5, 0.75, 1.0; θ0 0.85–1.00 (Δ 0.05); AR {0.8000|1.2500}; target power 0.8 and 0.9. I compared the results of PASS with the exact method of PowerTOST and the SAS-code for the noncentral t-distribution given by Jones & Kenward (2000) ported to R. Not surprisingly in all of my 1,152 scenarios the exact method agreed with the noncentral t. PASS not so much…

Paired samples

The design for ratios is not directly accessible in PASS (only for differences). Novices (aka “Push-the-button statisticians”) might not know how to set it up based on logs and conclude that is not possible.
PASS reports the sample size / group. So far so good. Though nobody would start a study with unequal group sizes (which give at least the desired power) and round up to the next even anyhow, there are many cases were 2×n of PASS is larger than already even (total) sample sizes by the exact method and the noncentral t. With a few exceptions (4 of my 128 scenarios) the sample sizes were larger than necessary (x̃ +2.11%, range –0.07 to +33.3%). In the most common area of CV 0.2–0.3, θ0 0.95, power 0.8: x̃ +6.27%. Why? Duno.

2×2×2

Accessible twice. Under the and . OK, why not. However, the results differ: For θ0 0.85, CV 1, power 0.8, I got in the former 2,334 (unless I ask for the exact sample size) and in the latter only 2,333. Likely most people use the latter and round up to next even number to get balanced sequences. Looks stupid if the output is part of the SAP.

2×2×4 (TRTR|RTRT, TRRT|RTTR, TTRR|RRTT)

Accessible under the . I’m not happy with the terminology of replicate studies used in PASS. Though Chen et al. (1998) used “Higher-Order” that’s rather unusual. Generally Higher-Order refers to more than two treatments.
Acc. to Chinese Whispers a sponsor had lengthy discussions with a “statistician” of a CRO using PASS. Since only ABBA|BAAB is given in Design setup and the manual, he insisted of using this one. Well, that’s uncommon. All regulatory agencies give TRTR|RTRT in their guidelines…

1. If you perform the study as TRRT|RTTR, statistically all is good but likely you have to deal with questions from assessors (who are rarely statisticians).
2. If you perform the study as ABAB|BABA to make regulators happy, of course you could use the sample size estimated in PASS because
• there are actually three 4-period 2-sequence replicate designs, namely ABAB|BABA, ABBA|BAAB and AABB|BBAA and
• all of them have the same design constants and degrees of freedom.
• Hence, in three hypothetical studies with the same effects one would observe exactly the same point estimates and residual variance.
• But: Imagine a picky assessor discovering the output of PASS in the SAP stating ABBA|BAAB, whilst the study was performed as ABAB|BABA. Questions on the way, again.
As we observed before, the 2×2×4 is beyond repair. I can only assume that the SE instead of the CV is used. Possibly there are still problems with the dfs and design constant to calculate the SEM. In 95 of my 128 scenarios the sample size was too large. If I assess only studies with n≥12, x̃ +4.35%, range ±0 to +33.3%. Example: θ0 0.95, CV 0.2, power 0.9: PASS estimates 16 though only 12 are needed.
2×2×3 (TRT|RTR, TRR|RTT)

Terminology again. Following Chen et al. only the “Three-Period, Two-Sequence Dual ABB|BAA” given. Sigh. More or less OK. Following the EMA’s Q&A and assessing only studies with at least 24 subjects: x̃ ±0%, range –0.65 to +7.69%.

2×4×4 (TRTR|RTRT|TRRT|RTTR, TRRT|RTTR|TTRR|RRTT)

I would not use any of those due to confounded effects. For completeness only.
Generally OK. Assessing only scenarios with n≥12: x̃ ±0%, range ±0% to +33.3%.

2×4×2 (TR|RT|TT|RR, Balaam’s design)

Generally OK. Assessing only scenarios with n≥24: x̃ ±0%, range –0.54% to +5.26%.

Higher-Order Designs (Latin Squares and Williams’ designs) for ratios are not implemented.
Reference-scaling not implemented.

We must no forget that any interventional trial carries some degree of risk. Hence, ICH E9 „Statistical Principles for Clinical Trials” stated already in 1998

The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed.

Large enough, not larger

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★

Russia,
2020-05-24 22:12
(514 d 01:52 ago)

@ Helmut
Posting: # 21459
Views: 2,665

## Customer satisfaction

Dear Helmut,

I would give you one month NCSS business analyst salary if I were NCSS top manager. But I am not, sorry

what I've found on their marketing page regarding customer satisfaction:
PASS 13 is fantastic! Better than my new dishwasher and microwave combined.

Kind regards,
Mittyri