What's wrong with this picture? [General Sta­tis­tics]

posted by Shuanghe  – Spain, 2015-09-28 20:57 (3570 d 04:37 ago) – Posting: # 15482
Views: 29,292

Dear all,

When I said it seems the analysis is doable in R I made at least 2 mistakes.
  1. I forgot the step 3 mentioned in warfarin's guidance. The 90% CI analysed by unscaled BE should be inside 80-125%. This (replicate BE), if I recall one of Detlew's earlier posts correctly, cannot be done in R at the moment, at least in the sense of "reproduce" SAS output.
  2. Step 2 in guidance mentioned that 95% upper CI should be calculated as:
    (YT - YR)2 - θ * s2WR,
    where "YT and YR are means of the ln-transformed PK endpoint" for T and R respectively, which is easy to calculate (or so I thought...)
Let's fotget the 90% CI since it's impossible to do so in R. I played a bit with R for calculation of the rest parameters as a way of learning R and here it is the code. What's wrong with this picture (see comments in red colour)?

# read data from website, variable names: subj, per, seq, treat, pk, logpk
emadata  <- "https://dl.dropboxusercontent.com/u/7685360/permlink/emabe4p.csv"
ema <- read.csv(file = emadata, colClasses = (c(rep("factor", 4), rep("numeric", 2))))

# remove subjects who has imcomplete periods. More elegant solution?
library(dplyr)
d <- filter(ema, subj != 11 & subj != 20 & subj != 24 & subj != 31 &
                 subj != 42 & subj != 67 & subj != 69 & subj != 71)

# create data sets containing only R1, R2, T1 and T2, respectively.
# change variable name accordingly and drop unnecessary variables.

r1 <- d %>% filter((seq == "TRTR" & per == 2) | (seq == "RTRT" & per == 1)) %>%
            mutate(pkr1 = logpk) %>%           # rename variable
            select(-c(per, treat, pk, logpk))  # drop variables

r2 <- d %>% filter((seq == "TRTR" & per == 4) | (seq == "RTRT" & per == 3)) %>%
            mutate(pkr2 = logpk) %>%
            select(-c(per, treat, pk, logpk))

t1 <- d %>% filter((seq == "TRTR" & per == 1) | (seq == "RTRT" & per == 2)) %>%
            mutate(pkt1 = logpk) %>%
            select(-c(per, treat, pk, logpk))

t2 <- d %>% filter((seq == "TRTR" & per == 3) | (seq == "RTRT" & per == 4)) %>%
            mutate(pkt2 = logpk) %>%
            select(-c(per, treat, pk, logpk))

# merge data sets
r12 <- merge(r1, r2,   by = c("seq", "subj"))
t12 <- merge(t1, t2,   by = c("seq", "subj"))
drt <- merge(r12, t12, by = c("seq", "subj"))

# calculate mean difference between T and R, diff between R1 and R2, T1 and T2
scavbe <- mutate(drt,
                 diff_tr = 0.5 * (pkt1 + pkt2 - pkr1 - pkr2),  # FDA's ilat
                 diff_r  = pkr1 - pkr2,  # FDA's dlat, for swr estimation
                 diff_t  = pkt1 - pkt2)  # additional variable to estimate swt

# calculate mean difference by sequence ---------------------
mdiff <- scavbe %>% group_by(seq) %>%
            summarise(size     = n(),  # number of subjects
                      mdiff_tr = mean(diff_tr),
                      mdiff_r  = mean(diff_r),
                      mdiff_t  = mean(diff_t))
# calculate swr and swt. same as SAS output
all <- merge(scavbe, mdiff, by = "seq")
sw <- all %>% mutate(ss_r = (diff_r - mdiff_r)**2,
                     ss_t = (diff_t - mdiff_t)**2) %>%
              summarise(s2wr = sum(ss_r) / (2*(n() - 2)),  # formula in FDA's guidance
                        s2wt = sum(ss_t) / (2*(n() - 2)),
                        swr = sqrt(s2wr),
                        swt = sqrt(s2wt))

# 95% upper CI as (mean(T) - mean(R))**2 - theta*s2wr. Finding mean T and R
# Should mean value be calculated like this? It seems too simple. But if not, how?

t <- filter(d, treat == "T") %>%
     mutate(mean_t = mean(logpk)) %>%
     distinct(treat) %>%
     select(mean_t)

r <- filter(d, treat == "R") %>%
     mutate(mean_r = mean(logpk)) %>%
     distinct(treat) %>%
     select(mean_r)

# critbound value is different from sas output.
# 90% CI for sigma(T)/sigma(R) are same as sas output

tr <- merge(t, r)
final <- merge(sw, tr) %>%
         mutate(size = nrow(distinct(d, subj)),
                theta = (log(1/0.9)/0.10)**2,
                critbound = (mean_t - mean_r)**2 - theta * s2wr,  ## different from SAS.
                cilower = swt/swr/(sqrt(qf(0.05, df1 = size - 2, df2 = size -2, lower.tail = FALSE))),
                ciupper = swt/swr/(sqrt(qf(0.95, df1 = size - 2, df2 = size -2, lower.tail = FALSE))))

So to summarise, I can get sWR, sWT, 90% CI for variability comparison (Thanks to Detlew), but not 95% upper limit for (YT - YR)2 - θ * s2WR. I'm sure the error comes from the mean T and R. Any thought?

All the best,
Shuanghe

Complete thread:

UA Flag
Activity
 Admin contact
23,428 posts in 4,929 threads, 1,686 registered users;
69 visitors (0 registered, 69 guests [including 10 identified bots]).
Forum time: 01:34 CEST (Europe/Vienna)

No matter what side of the argument you are on,
you always find people on your side
that you wish were on the other.    Thomas Berger

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5