Hi Artem, concerning your question in the other thread: ❝ I need to calculate additional parameter in ANOVA  Cohort factor. Oh, the hobby of the Russian «Экспертами»… ❝ Then in the case of a EMA Model Specification is: ❝ sequence+subject(sequence)+period+treatment+cohort ❝ Am I right? I’m afraid, no. The EMA does not specify a model. In the BEGL we find only: 4.1.1 Study design ❝ And how Model Specification can be constructed for agencies recommending a mixedeffects model (FDA, Health Canada)? You find the FDA’s models under the FOI and some members of the forum have a letter with the same wording. The FDA suggests three models (group instead of cohort):
Since Russia follows the EMA’s footprints, treat subjects as fixed instead of random.^{c} The decision scheme (i.e., whether data can be pooled or analysis of the largest group is recommended) is applicable as well. It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used.
— 
Hi Helmut! Your opinion is very important for Russian BEBA amateurs, so I'm expecting your approach will be 'carved in Russian stone'. It would be great if we get some consensus regarding models with group term (until the moment when our experts will change their mind or, probably, all other world will be convinced by Russian experts). ❝ The nasty thing is that the GroupbyTreatment interaction test has low power (therefore, testing at the 0.1 level). You should expect a false positive rate at the level of the test and trash some of your studies due to lacking power. Could you please clarify this point? I saw many times the problem of power for Sequence term for simple model and GroupbyTreatment interaction for FDA model I. Is it possible to prove that with sims? Or somebody did this work analytically? PS: I suspect a lot of fun with replicate designs. Your model specification with group (from Österreich with love ) works well even there, but it doesn't mean that this model is applicable for replicate designs (as we discussed elsewhere). — Kind regards, Mittyri 
Hi mittyri, ❝ Your opinion is very important for Russian BEBAC amateurs, so I'm expecting your approach will be 'carved in Russian stone' If they are following the forum (are they?) I want to make one point clear: I do not advocate routinely using the group procedures of the FDA! I would say that the EMA accepts without reservation that the group effect “cannot be reasonably assumed to have an effect on the response variable.” ❝ ❝ The nasty thing is that the GroupbyTreatment interaction test has low power (therefore, testing at the 0.1 level). You should expect a false positive rate at the level of the test … ❝ ❝ Could you please clarify this point? I saw many times the problem of power for Sequence term for simple model … Senn^{1} (who always strongly argued against testing the sequence – or better unequal carryover – effect!) writes: Because the power of the test is low, being based on betweenpatient difference, a high nominal level of significance (usually 10%) is used. An interesting statement by the EMA^{2} concerning the treatment by covariate interaction:The primary analysis should include only the covariates prespecified in the protocol and no treatment by covariate interaction terms. […] Tests for interactions often lack statistical power and the absence of statistical evidence of an interaction is not evidence that there is no clinically relevant interaction. Conversely, an interaction cannot be considered as relevant on the sole basis of a significant test for interaction. Assessment of interaction terms based on statistical significance tests is therefore of little value [sic]. (my emphases)❝ … and GroupbyTreatment interaction for FDA model I. Is it possible to prove that with sims? Or somebody did this work analytically? Don’t know. I’m in contact with a Canadian CRO to collect empiric evidence (like D’Angelo et al.^{3} did for carryover). We will include only studies where groups were separated by just a couple of days and all of the FDA’s criteria for pooling were fulfilled. A great deal of work but seemingly ~⅒ of studies show a significant groupbytreatment interaction. ❝ I suspect a lot of fun with replicate designs. Your model specification with group […] works well even there, but it doesn't mean that this model is applicable for replicate designs (as we discussed elsewhere). Yep.
— 
Hi Hötzi and Mittyri, this thread is interesting and confusing to me. May I ask or comment for clarification: M: "Is it possible to prove that with sims?"  what is it you want to prove? Can you formulate it plain and simple? Sims are totally possible, I just need to figure out the equations, as well as have a purpose. H: "It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used."  a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm. M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it H: "(...) seemingly ~⅒ of studies show a significant groupbytreatment interaction. "  this is expected by chance. You apply a 10% significance level. By chance 10% will then be significant. (and by the way: Which denominator in F did you apply; within or between?) — Pass or fail! ElMaestro 
Hi ElMaestro, ❝ M: "Is it possible to prove that with sims?"  what is it you want to prove? Can you formulate it plain and simple? Sims are totally possible, I just need to figure out the equations, as well as have a purpose. Not M but answering anyway. The idea behind the GroupbyTreatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95^{–1} (CV ad libitum). Merge them to get a “study”. Run model 1 and check the pvalue of the GroupbyTreatment interaction. With the simple model you should expect T/R 1. ❝ H: "It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used."  a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm. I had one data set where the fixed effects model in Phoenix/WinNonlin showed me the finger. Same in JMP (“poor man’s SAS”). Have to check again. ❝ M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it True. ❝ H: "(...) seemingly ~10% of studies show a significant groupbytreatment interaction. "  this is expected by chance. You apply a 10% significance level. By chance 10% will then be significant. Exactly. That’s the idea of assessing real studies. If there would be a true GroupbyTreatment interaction (i.e., not random alone) we could expect significant results in >10% of studies. This is what I have so far (I hope that the Canadians will come up with another ~100).
Numerator DF = Groups – 1
Denominator DF = Subjects – 2 × Groups 
Hi Hötzi, ❝ The idea behind the GroupbyTreatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95^{–1} (CV ad libitum). Merge them to get a “study”. Run model 1 and check the pvalue of the GroupbyTreatment interaction. With the simple model you should expect T/R 1. Thanks for this. This sounds reasonable (T/R=1, assuming equal group sizes). Could you tell how you got your Ftest denominator? I am sure you are right, but I don't know where it came from. For an interaction of a between and withinfactor I think the rule of thumb (which is also a wee bit hard to define ) is to test against the within, which in this case would be the model residual. — Pass or fail! ElMaestro 
Hi ElMaestro, ❝ Could you tell how you got your Ftest denominator? I am sure you are right, but I don't know where it came from. For an interaction of a between and withinfactor I think the rule of thumb (which is also a wee bit hard to define ) is to test against the within, which in this case would be the model residual. Yep. Below an example of model 1 in Phoenix/WinNonlin. Two groups (n=24 each), all effects fixed. Partial Sum of Squares N: Σn_{G} = 48 G: 2 Numerator DF: G – 1 = 1 Denominator DF: N  2G = 44 F: 0.0131109/0.0126246 = 1.0385149 round(pf(1.0385149, 1, 44, lower.tail=FALSE), 5) ✔— Diftor heh smusma 🖖🏼 Довге життя Україна! _{} Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
Hi Helmut and ElMaestro, Helmut answered to the question directed to me more accurate than I can ❝ The idea behind the GroupbyTreatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95^{–1} (CV ad libitum). Merge them to get a “study”. Run model 1 and check the pvalue of the GroupbyTreatment interaction. With the simple model you should expect T/R 1. seems to be reasonable, but I do not see why the power is low? So if our hypothesis is that the power is low, we need to reject H0 that power is high, in another words to prove that sensitivity of this term to deviations is low. By the way if the power of this term is low, some other should be high, right? which one? ❝ ❝ M+H: FDA are also fitting subject as fixed even when using the random statement in PROC GLM. Some of them just have not realised it ❝ ❝ True. AFAIK PHX knows only one model where subject is fitted as fixed term when placed to the variance structure, that's conventional model. In all other cases LinMix will switch to the mixed modeling. ❝ ❝ (and by the way: Which denominator in F did you apply; within or between?) ❝ ❝ Numerator DF = Groups – 1 ❝ Denominator DF = Subjects – 2 × Groups Yes, the results are the same for complete data (mixed vs glm) — Kind regards, Mittyri 
Hi mittyri, ❝ ❝ The idea behind the GroupbyTreatment interaction is that the T/R in one group is different from the other (i.e., we have collinearity with a “hidden” variable). Therefore, simulate a group of subjects with T/R 0.95 and another one with T/R 0.95^{–1} (CV ad libitum). Merge them to get a “study”. Run model 1 and check the pvalue of the GroupbyTreatment interaction. With the simple model you should expect T/R 1. ❝ ❝ seems to be reasonable, but I do not see why the power is low? Good question. Next question? I performed simulations (100,000 2×2×2 studies each for conditions a. and b. specified below). Two groups of 16 subjects each, CV 30%, no period and sequence effects. 32 subjects should give power 81.52% for T/R 1. If the GroupbyTreatment interaction is not significant (p ≥0.1) in model 1, the respective study is evaluated by model 2 (pooled data) or both groups by model 3 otherwise. All studies are evaluated by model 3 (pooled data). The listed PE is the geometric mean of passing studies’ PEs.
Lessons learned: If we test at the 10% level and there is no true GroupbyTreatment interaction we will find a significant effect at ~ the level of the test – as expected (b). Hurray, false positives! On the other hand, if there is one, we will detect it (a). The percentage of studies passing in models 2 and 3 are similar. Theoretically in model 2 it should be slightly lower than in model 3 (one degree of freedom of the treatment effect less). However, overall power is seriously compromised. Slowly I get the impression that the evaluation of groups (by model 3) is not a good idea. If there is a true GroupbyTreatment interaction why the heck should the PE (say in the largest group) be unbiased? I would rather say that if one believes that a GroupbyTreatment interaction really exists (I don’t) and the test makes sense (I don’t) evaluation (of the largest group) by model 3 should not be performed. Consequently ~⅒ of (otherwise passing) studies would go into the waste bin. Didn’t I say that before? The distribution of pvalues should be uniform. Looks good for b. Min. 1st Qu. Median Mean 3rd Qu. Max. Interesting shape for a. Min. 1st Qu. Median Mean 3rd Qu. Max. If you prefer more extreme stuff: T/R in group 1 0.90, T/R in group 2 0.90^{–1}
Min. 1st Qu. Median Mean 3rd Qu. Max.

PS: The code seems to work – at least for the pooled model 3. Comparisons of powers

power.TOST(...) 0.815152 
Hi Helmut, you've made a great work! Won't it be published? In your examples (simulations/practice) you showed the TxG test is not a good idea. I was impressed by this: ❝ ❝ b. T/R in both groups 1.00 ❝ (i.e., no GroupbyTreatment interaction): ❝ ❝ If you prefer more extreme stuff: T/R in group 1 0.90, T/R in group 2 0.90^{–1} ❝ I see that the sensitivity is really low, but I think it is not a good idea to compensate it with low specificity (high false positive). Once again, thank you very much! Wouldn't you mind to publish the code of data building for simulations? — Kind regards, Mittyri 
Hi mittyri,

THX!

❝ Won't it be published?

I hope so. It is on my todolist since last summer…

❝ In your examples (simulations/practice) you showed the TxG test is not a good idea.
❝ I was impressed by this: […]
❝
❝ I see that the sensitivity is really low, but I think it is not a good idea to compensate it with low specificity (high false positive).

Right. I have no idea where this idea come from.

❝ Wouldn't you mind to publish the code of data building for simulations?

I used code developed by Martin and me many years ago to simulate 2×2 designs followed by a simple rbind(part1, part2) . I improved it to speed things up (which worked). Unfortunately I screwed up a couple of days ago (no version control, saving over). Shit.In the meantime you can (mis)use the code Detlew distributed last year to simulate replicate designs. I duplicated his functions. Start with the RTRTTRTR and set CV_{WT} = CV_{wR} = CV_{bT} = CV_{bR} . The code below shows the relevant changes after his example and how to sim. For the plot you have to attach package lattice .set.seed(123456) 
Hi mittyri and all, Rcode to estimate the loss in power (two groups of equal size). Example for my simulations above and assuming that we will get a significant GroupbyTreatment interaction at the level of the test.
— 
Hi mittyri and all, the pvalue of the GroupbyTreatment interaction seemingly does not depend on the interval between groups. In 51% of the studies the interval was three days or less but in some substantially longer (i.e., steady state studies where the clinic was occupied). The bubbles’ area, the linear regression, and the loess curve are scaled/weighed by the sample size.
Therefore, we should not worry. The FDA defines "greatly separated in time" as "months apart, for example". 
Hi Helmut, I suppose the problem could be not in the case of 'significant separation in time' but in case of some mistakes in IMP handling. For example, RIMP has a proven stability up to 30C and TIMP up to 25 only. Due to some "why bother" attitude the designated employee missed it. As a result the second group will be treated with poor TIMP. I assume here that the order of groups treatment is GR1PER1 GR1PER2 GR2PER1 GR2PER2 The CRO's are usually mixing the time for groups for more effective time management. So I think in case of appropriate IMP handling we wouldn't observe any real (not falsepositive) interaction. Please correct me if I'm wrong here. — Kind regards, Mittyri 
Hi mittyri,

❝ I suppose the problem could be not in the case of 'significant separation in time' but in case of some mistakes in IMP handling.
❝ For example, RIMP has a proven stability up to 30C and TIMP up to 25 only. Due to some "why bother" attitude the designated employee missed it. As a result the second group will be treated with poor TIMP. I assume here that the order of groups treatment is
❝ GR1PER1
❝ GR1PER2
❝ GR2PER1
❝ GR2PER2

That's what I would call a "stacked approach". IMHO, not a good idea for single dose but might be necessary in steady state studies if the capacity of the clinical site is limited.

❝ The CRO's are usually mixing the time for groups for more effective time management.

Yep – the "staggered approach" keeps the interval as short as possible. 60% of my data sets had an interval of less then seven days. In most of my single dose studies the interval was one to three days.

❝ So I think in case of appropriate IMP handling we wouldn't observe any real (not falsepositive) interaction.

Agree. 
Hi mittyri, continuing the evaluation of the data sets of this post. Background: Some studies are quite dated (the oldest performed in October 1992!). In those days a prespecified acceptance range of 75.00–133.33% (or even 70.00–142.86%) was acceptable for C_{max}. However, I evaluated all data sets for the common 80.00–125.00%. This explains why more than the expected 20% failed (no, I didn’t screw up the design ). If ever possible I tried to avoid equal group sizes but kept one as large as possible. Didn’t succeed sometimes. Working in a CRO, the sponsor is always right… All data sets were evaluated by model 3 for pooled data (like in the reports – I never cared about groups) and by model 1 to get the p value of the GroupbyTreatment interaction.
On another note: If we apply model 2 without a pretest (maybe the best way to go for regulators insisting in a groupterm) the loss in power compared to the pooled model 3 is negligible. Reasonable, since we lost only few residual degrees of freedom:

pooled model 3: DF=n_{1}+n_{2}–2
pooled model 2: DF=n_{1}+n_{2}(N_{groups}–1)–2 
Hi ElMaestro, ❝ ❝ It should be noted that in rare cases (e.g., extremely unbalanced sequences) the fixed effects model gives no solution and the mixed effects model has to be used. ❝ ❝ a realistic linear model will have a single analytical solution unless you make a specification error. Imbalance would not affect that, please describe where/how you came a cross a fit which failed with the lm. I was right about failing in JMP and Phoenix WinNonlin. Sorry I can’t disclose the data set. Naïve pooling was performed. Deficiency letter by the MHRA in summer 2016: The applicant should present estimates and 95% confidence interval for the difference between the Test and the Reference product on a ratio scale from ANOVA model, that reflects the design of the study, with terms for Group, Sequence, Sequence * Group, Subject (Sequence * Group), Period (Group), Treatment as fixed effects. Note that this is the FDA’s model 2 with fixed effects. Why the 95% CI instead of the 90% CI was required is another story. The data set (subjects fixed) did not converge in JMP. Switched to random and all was good. Was accepted by the MHRA’s assessor.Phoenix showed me the finger with the fixed effect Subject(Sequence*Group) in Model 1and execution stopped (no results at all).In model 2 I got the same warning as above but these results:
No problems in R. Model 1:
Model 2:

Diving deeper into it. Originally I set up the models in Phoenix WinNonlin's Bioequivalence module, which sits on top of Linear Mixed Effects . When I send the data directly to Linear Mixed Effects (all fixed) no error, no warning, nada. CI identical to the one from R to 12 significant digits.Conclusion: Bug in Phoenix WinNonlin's Bioequivalence module.BTW: Running model 1 of my 85 data sets (5,004 subjects) in Bioequivalence takes more than ten hours and sucks up almost my entire 16 GB RAM (memory leak?). Direct execution in Linear Mixed Effects takes five minutes (max. RAM consumption 175 MB).Much slower than R, which takes five seconds for model 1, model 2, model 3 (for each group), and model 3 (pooled).

I was wrong. Has nothing to do with unbalanced sequences and/or unequal group sizes. 
Hi Hötzi, I am not a WNL/Phoenix user, but if your post is correct then I imagine around 100 CROs as of today will need to change their software validation status from PQ'ed to "unknown" etc, contact the vendor, await a response and in the meantime do everything they can to study the potential impact on data generated (which they cannot necessarily do unless they have other 'validated' software that achieves the same)? Only the software developer, who has the source, will be able to tell if there is a bug, and if there is, if the bug affects other models, and how/when. This is not good. Man, I don't even quite know what validated means anymore. — Pass or fail! ElMaestro 
Hi, Helmut, Mittyri and ELMaestro! Thanks for alot of usefull information! I've made some calculation with the models from your post. BE Study with 44 subj (2 groups by 22 sbj) in Phoenix 6.4 (Formulation=Treatment, FF = Cmax/AUC). Step 1. Model Fixed Effects: Group+Sequence+Formulation+Group*Sequence+Group*Period+Group*Formulation. Model Random Effects: Subject(Group*Sequence). Partial Test: Dependent Hypothesis Numer_DF Denom_DF F_stat P_value P_value is good, but is this normaly that DF is not integer? Step 2. Model Fixed Effects: Group+Sequence+Formulation+Group*Sequence+Group*Period. Model Random Effects: Subject(Group*Sequence). Partial Test: Dependent Hypothesis Numer_DF Denom_DF F_stat P_value P_value is also acceptable, but is this normaly that DF here is also not integer? — Best Regards, Artem 
Hi Artem, Welcome to the world of mixed modeling! Helmut suggested (see endnote c) to switch to the model with all effects as fixed. Your model is the same as suggested by FDA (with Subject as random). So Phoenix switched to the mixed model and used Satterthwaite degrees of freedom (which could be not integer). Note that it is impossible to provide ANOVA tables (partial/sequential ss output PHXspeaking) for mixed models. So due to no convention in Russian brains you are free to use your current model or the model provided by Helmut (with all effects fixed). Latter is more suitable for me due to hazelnut brain. PS: are there any incomplete data (missed period) for some subjects? — Kind regards, Mittyri 
Hi, Mittyri! I've tried some fixed models after posting previous reply. Situation with DF has improved. Also I checked the data, its fine. It was strange for me because standard Phoenix model (Fixed: PRD, TRT, SEQ; Random: Subj(SEQ)) gives integer DF on same dataset. Now it makes clearer, so nvm. The deeper you are into the statistics, the more terrible it becomes. Thanks for help. — Best Regards, Artem 
Hi Artem,

as mittyri already pointed out you should use fixed effect models in Phoenix/WinNonlin.

❝ Step 2.
❝ P_value is also acceptable, …

There are no "acceptable" pvalues in model 2. Any one is just fine. Only in model 1 check the pvalue of the GroupbyTreatment interaction. 
Hi Artem and all,

❝ Oh, the hobby of the Russian «Экспертами»…

I have to correct myself. They are blindly following guidelines of the Eurasian Economic Union (Nov 2016, Dec 2015). Practically a 1:1 translation of the FDA's guidance:

Исследования в нескольких группах

94. Если исследование проведено в двух и более группах и эти группы изучались в различных клинических центрах или в одном и том же центре, но были разделены большим промежутком времени (например, месяцами), возникает сомнение относительно возможности объединения результатов, полученных этих группах, в один анализ. Такие ситуации необходимо обсуждать с уполномоченным органом.

Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not? 
Hello Helmut! ❝ Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not? You are citing too fresh doc I think. The doc came into force on 6th of May. So strictly speaking it was not obligatory to use draft of the doc during clinical trial conducting before 6th of May. So, nobody used it I think and no experience was gathered. But I would try it)) If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it. — Best regards Beholder 
Dear Helmut, Dear Beholder, @Helmut ❝ ❝ Does any of our Russian members know whether the above was ever accepted? This section talks about giving in the report a justification for not performing such an analysis. Has anybody ever tried to give the justification already in the protocol? If yes, what happened? If no, why not? I'd name the group effect as a 'penalty for carelessness'. After some hot discussions last week I understood that's what experts are waiting for since they do not want to dial back. So some time ago (about 3 years ago) 'group' trend appeared in their mind. The experts asked after reports submission: group? group? group? On the stage of request on report it was almost impossible to justify the model without groups. By the way now when this topic is very popular, the team who's developing the protocol should include the justification regarding absence of group effect in the model. Otherwise 'groupshot' is very likely. @Beholder ❝ If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it. Edit: Changed to internal link; see also this post #7. [Helmut] — Kind regards, Mittyri 
❝ If Im not mistaken, you wrote something about such algorithm somewhere in forum but I could not find it. yes, found post regarding the EEU GL, which I mentioned. Edit: Changed to internal link; see also this post #7. [Helmut] — Best regards Beholder 
Dear all My name is Mikalai, and I am responsible for the conduct of bioequivalence studies in a mediumsized private pharmaceutical company in Belarus. Due to logistic issues (a small clinical center and a highly variable drug) we have to conduct a bioequivalence study in multiple groups (two). Our competent authority requires a justification not to include the group effect in the proposed statistical model. The groups will be separated by a week at maximum. It seems that we meet criteria set out by FDA to use a statistical model without including the group effect. Our competent authority can accept the FDA position on this issue, but we should properly reference it. Thus, where can I find this information under FOI (link) or might it be possible that someone can share a copy of letter signed by Barbara Davit where it is outlined requirements to ignore the group effect in a statistical model? Any help will be appreciated. Sincerely, Mikalai Edit: I moved your post from an answer to this one, deleted your email address, and activated personal messages in your profile instead. [Helmut] 
Hi Mikali, ❝ Due to logistic issues (a small clinical center and a highly variable drug) we have to conduct a bioequivalence study in multiple groups (two). Are you aiming at referencescaling for C_{max}, i.e., perform the study in a replicate design? Even if not, opt for the “staggered approach” – not the “stacked” one (see above). ❝ Our competent authority requires a justification not to include the group effect in the proposed statistical model. IMHO, stupid – but according to the GL. ❝ The groups will be separated by a week at maximum. Very good. ❝ It seems that we meet criteria set out by FDA to use a statistical model without including the group effect. Our competent authority can accept the FDA position on this issue, but we should properly reference it. See this presentation summarizing my current thinking. Note that (since I don’t speak Russian) my remarks given on slide 16 are only partly correct. A justification in the protocol (as you rightly mentioned) should be acceptable. In the discussion following my presentation it became clear that:
❝ Thus, where can I find this information under FOI (link) … FDA’s stepwise models (which I would never ever use) are given here and there (maybe there are some more; too lazy to google). However, in the second document you find Comment 9: If ALL of the following criteria are met, it may not be necessary to include GroupbyTreatment in the statistical model:
I can't share mine (chained to my table by a CDA). Maybe Detlew can share his. Note that the wording of Barbara's letter is identical to the second reference above. 
Dear Helmut, Thank you very much. Yes, we plan to use the "staggered approach". We used to plan the replicative design, but a large dropout was observed during our last bioequivalence study (crossover design but with long blood sampling interval) due to completely unrelated to the study reasons. Thus, the company decided to put on hold the replicative design given that a clinical center can accommodate only a bit more than 35 volunteers. We are opting for the model III and will try to use the FDA requirements to justify the statistical model in the protocol. Bioequivalence studies done in accordance with international standards are relatively new for Belarus. As a result, experience, training, and lore are issues for major players (manufacturers, regulators, clinical investigators). We also cannot run studies outside the Eurasian Economic Union. Thus, from time to time our regulators have to rely on opinions of more experienced colleagues, mainly from EMA and FDA. If they see that something is acceptable in Europe or the USA, they usually give us a green light. That why we need papers or proper references. Regards, Mikalai. 
Dear Helmut, You are Key Opinion Leader in Russia (not limited to Russia I believe) ❝ • Nobody tried model II without a pretest (this would be a much better option than the FDA’s stepwise models). Why? Duno. Why do you think that nobody has tried? Talked to some involved guys, they are trying and waiting for experts feedback — Kind regards, Mittyri 
Hi Mittyri,

❝ You are Key Opinion Leader in Russia (not limited to Russia I believe)
❝
❝ • Nobody tried model II without a pretest (this would be a much better option than the FDA's stepwise models). Why? Duno.
❝
❝ Why do you think that nobody has tried?

In Yaroslavl I specifically asked the participants. Maybe the ones you know were there but didn't want to come up in front of the experts?

❝ Talked to some involved guys, they are trying and waiting for experts feedback

Great. Let's keep our fingers crossed.

Also in Yaroslavl people encouraged me to publish my metastudy. Well, I'm still collecting data (_{} Astea). They also suggested a Russian Journal. I don't like that, since seemingly Russian is more ambiguous than English. Originally I thought of the Journal of Biopharmaceutical Statistics (where Pina D'Angelo's article about carryover was published). When I presented about MultiGroup Studies at the 2^{nd} Annual Biosimilars Forum (October 2017) everybody (and this was a statistical audience from agencies, the industry, and CROs) was surprised that it is an issue at all. Nobody (!) would expect anything than simple pooling. The consensus was that the FDA's stagewise procedure might even inflate the Type I Error and should be avoided. Given that I guess such a manuscript will be rejected right away due to its doubtful content. 
Dear Helmut! (_{}) The problem will remain untill the Eurasian Economic Union requirements will be corrected or modified. And instead of forgetting this subject as a nightmare we see the inverse tendention: new economic association involves new countries into this problem. Now Belarus was dragged into this story... What's next? It might be stopped just after a scientific paper by the respected author will announce the uselessness of this stuff... — "Being in minority, even a minority of one, did not make you mad" 
Dear colleagues, ❝ ... What's next? ... Next is Kazakhstan, then Kyrgyzstan, then Armenia... — Best regards Beholder 
Dear Astea! ❝ ... It might be stopped just after a scientific paper by the respected author will announce the uselessness of this stuff... Do you really believe that regulators (aka «Экспертами») can be convinced by science? May the Lord preserve your infant faith . — Regards, Detlew 
Dear d_labes! ❝ Do you really believe that regulators (aka «Экспертами») can be convinced by science? ❝ May the Lord preserve your infant faith . "But I Tried, Didn't I? Goddamnit, at Least I Did That"  McMurphy ("One Flew Over the Cuckoo's Nest", 1975.) — Best regards Beholder 
Dear beholder! ❝ "But I Tried, Didn't I? Goddamnit, at Least I Did That"  McMurphy ("One Flew Over the Cuckoo's Nest", 1975.) Swearing is of the чёрт — Regards, Detlew 
Dear d_labes! "You may say that I am a dreamer, but I am not the only one..." Thanks for the support, Beholder! — "Being in minority, even a minority of one, did not make you mad" 
Dear Astea! First: Call me Detlew. You are also allow to pronounce it Detluuu. Up to now only my wife is allowed to do so . ❝ "You may say that I am a dreamer, but I am not the only one..." ❝ ❝ Thanks for the support, Beholder! Excuse me, the old grumpy buffer. You are young. And dreaming is the privilege of the youth. — Regards, Detlew 
Dear Detlew! Thank you! I am not so young to believe in fairy tales, but I am not so old not to believe in mind. I see how things change to the better end on my own eyes just because of the presence of not indifferent scientific people. P.S. You can call me Nastia (similar to nasty  easy to remember) — "Being in minority, even a minority of one, did not make you mad" 