## What's wrong with this picture? [General Sta­tis­tics]

Dear all,

When I said it seems the analysis is doable in R I made at least 2 mistakes.
1. I forgot the step 3 mentioned in warfarin's guidance. The 90% CI analysed by unscaled BE should be inside 80-125%. This (replicate BE), if I recall one of Detlew's earlier posts correctly, cannot be done in R at the moment, at least in the sense of "reproduce" SAS output.
2. Step 2 in guidance mentioned that 95% upper CI should be calculated as:
(YT - YR)2 - θ * s2WR,
where "YT and YR are means of the ln-transformed PK endpoint" for T and R respectively, which is easy to calculate (or so I thought...)
Let's fotget the 90% CI since it's impossible to do so in R. I played a bit with R for calculation of the rest parameters as a way of learning R and here it is the code. What's wrong with this picture (see comments in red colour)?

# read data from website, variable names: subj, per, seq, treat, pk, logpk emadata  <- "https://dl.dropboxusercontent.com/u/7685360/permlink/emabe4p.csv" ema <- read.csv(file = emadata, colClasses = (c(rep("factor", 4), rep("numeric", 2)))) # remove subjects who has imcomplete periods. More elegant solution? library(dplyr) d <- filter(ema, subj != 11 & subj != 20 & subj != 24 & subj != 31 &                  subj != 42 & subj != 67 & subj != 69 & subj != 71) # create data sets containing only R1, R2, T1 and T2, respectively. # change variable name accordingly and drop unnecessary variables. r1 <- d %>% filter((seq == "TRTR" & per == 2) | (seq == "RTRT" & per == 1)) %>%             mutate(pkr1 = logpk) %>%           # rename variable             select(-c(per, treat, pk, logpk))  # drop variables r2 <- d %>% filter((seq == "TRTR" & per == 4) | (seq == "RTRT" & per == 3)) %>%             mutate(pkr2 = logpk) %>%             select(-c(per, treat, pk, logpk)) t1 <- d %>% filter((seq == "TRTR" & per == 1) | (seq == "RTRT" & per == 2)) %>%             mutate(pkt1 = logpk) %>%             select(-c(per, treat, pk, logpk)) t2 <- d %>% filter((seq == "TRTR" & per == 3) | (seq == "RTRT" & per == 4)) %>%             mutate(pkt2 = logpk) %>%             select(-c(per, treat, pk, logpk)) # merge data sets r12 <- merge(r1, r2,   by = c("seq", "subj")) t12 <- merge(t1, t2,   by = c("seq", "subj")) drt <- merge(r12, t12, by = c("seq", "subj")) # calculate mean difference between T and R, diff between R1 and R2, T1 and T2 scavbe <- mutate(drt,                  diff_tr = 0.5 * (pkt1 + pkt2 - pkr1 - pkr2),  # FDA's ilat                  diff_r  = pkr1 - pkr2,  # FDA's dlat, for swr estimation                  diff_t  = pkt1 - pkt2)  # additional variable to estimate swt # calculate mean difference by sequence --------------------- mdiff <- scavbe %>% group_by(seq) %>%             summarise(size     = n(),  # number of subjects                       mdiff_tr = mean(diff_tr),                       mdiff_r  = mean(diff_r),                       mdiff_t  = mean(diff_t)) # calculate swr and swt. same as SAS output all <- merge(scavbe, mdiff, by = "seq") sw <- all %>% mutate(ss_r = (diff_r - mdiff_r)**2,                      ss_t = (diff_t - mdiff_t)**2) %>%               summarise(s2wr = sum(ss_r) / (2*(n() - 2)),  # formula in FDA's guidance                         s2wt = sum(ss_t) / (2*(n() - 2)),                         swr = sqrt(s2wr),                         swt = sqrt(s2wt)) # 95% upper CI as (mean(T) - mean(R))**2 - theta*s2wr. Finding mean T and R # Should mean value be calculated like this? It seems too simple. But if not, how? t <- filter(d, treat == "T") %>%      mutate(mean_t = mean(logpk)) %>%      distinct(treat) %>%      select(mean_t) r <- filter(d, treat == "R") %>%      mutate(mean_r = mean(logpk)) %>%      distinct(treat) %>%      select(mean_r) # critbound value is different from sas output. # 90% CI for sigma(T)/sigma(R) are same as sas output tr <- merge(t, r) final <- merge(sw, tr) %>%          mutate(size = nrow(distinct(d, subj)),                 theta = (log(1/0.9)/0.10)**2,                 critbound = (mean_t - mean_r)**2 - theta * s2wr,  ## different from SAS.                 cilower = swt/swr/(sqrt(qf(0.05, df1 = size - 2, df2 = size -2, lower.tail = FALSE))),                 ciupper = swt/swr/(sqrt(qf(0.95, df1 = size - 2, df2 = size -2, lower.tail = FALSE)))) 
So to summarise, I can get sWR, sWT, 90% CI for variability comparison (Thanks to Detlew), but not 95% upper limit for (YT - YR)2 - θ * s2WR. I'm sure the error comes from the mean T and R. Any thought?

All the best,
Shuanghe Ing. Helmut Schütz 