What's wrong with this picture? [General Sta­tis­tics]

posted by Shuanghe  – Spain, 2015-09-28 20:57 (3103 d 21:00 ago) – Posting: # 15482
Views: 26,034

Dear all,

When I said it seems the analysis is doable in R I made at least 2 mistakes.
  1. I forgot the step 3 mentioned in warfarin's guidance. The 90% CI analysed by unscaled BE should be inside 80-125%. This (replicate BE), if I recall one of Detlew's earlier posts correctly, cannot be done in R at the moment, at least in the sense of "reproduce" SAS output.
  2. Step 2 in guidance mentioned that 95% upper CI should be calculated as:
    (YT - YR)2 - θ * s2WR,
    where "YT and YR are means of the ln-transformed PK endpoint" for T and R respectively, which is easy to calculate (or so I thought...)
Let's fotget the 90% CI since it's impossible to do so in R. I played a bit with R for calculation of the rest parameters as a way of learning R and here it is the code. What's wrong with this picture (see comments in red colour)?

# read data from website, variable names: subj, per, seq, treat, pk, logpk
emadata  <- "https://dl.dropboxusercontent.com/u/7685360/permlink/emabe4p.csv"
ema <- read.csv(file = emadata, colClasses = (c(rep("factor", 4), rep("numeric", 2))))

# remove subjects who has imcomplete periods. More elegant solution?
library(dplyr)
d <- filter(ema, subj != 11 & subj != 20 & subj != 24 & subj != 31 &
                 subj != 42 & subj != 67 & subj != 69 & subj != 71)

# create data sets containing only R1, R2, T1 and T2, respectively.
# change variable name accordingly and drop unnecessary variables.

r1 <- d %>% filter((seq == "TRTR" & per == 2) | (seq == "RTRT" & per == 1)) %>%
            mutate(pkr1 = logpk) %>%           # rename variable
            select(-c(per, treat, pk, logpk))  # drop variables

r2 <- d %>% filter((seq == "TRTR" & per == 4) | (seq == "RTRT" & per == 3)) %>%
            mutate(pkr2 = logpk) %>%
            select(-c(per, treat, pk, logpk))

t1 <- d %>% filter((seq == "TRTR" & per == 1) | (seq == "RTRT" & per == 2)) %>%
            mutate(pkt1 = logpk) %>%
            select(-c(per, treat, pk, logpk))

t2 <- d %>% filter((seq == "TRTR" & per == 3) | (seq == "RTRT" & per == 4)) %>%
            mutate(pkt2 = logpk) %>%
            select(-c(per, treat, pk, logpk))

# merge data sets
r12 <- merge(r1, r2,   by = c("seq", "subj"))
t12 <- merge(t1, t2,   by = c("seq", "subj"))
drt <- merge(r12, t12, by = c("seq", "subj"))

# calculate mean difference between T and R, diff between R1 and R2, T1 and T2
scavbe <- mutate(drt,
                 diff_tr = 0.5 * (pkt1 + pkt2 - pkr1 - pkr2),  # FDA's ilat
                 diff_r  = pkr1 - pkr2,  # FDA's dlat, for swr estimation
                 diff_t  = pkt1 - pkt2)  # additional variable to estimate swt

# calculate mean difference by sequence ---------------------
mdiff <- scavbe %>% group_by(seq) %>%
            summarise(size     = n(),  # number of subjects
                      mdiff_tr = mean(diff_tr),
                      mdiff_r  = mean(diff_r),
                      mdiff_t  = mean(diff_t))
# calculate swr and swt. same as SAS output
all <- merge(scavbe, mdiff, by = "seq")
sw <- all %>% mutate(ss_r = (diff_r - mdiff_r)**2,
                     ss_t = (diff_t - mdiff_t)**2) %>%
              summarise(s2wr = sum(ss_r) / (2*(n() - 2)),  # formula in FDA's guidance
                        s2wt = sum(ss_t) / (2*(n() - 2)),
                        swr = sqrt(s2wr),
                        swt = sqrt(s2wt))

# 95% upper CI as (mean(T) - mean(R))**2 - theta*s2wr. Finding mean T and R
# Should mean value be calculated like this? It seems too simple. But if not, how?

t <- filter(d, treat == "T") %>%
     mutate(mean_t = mean(logpk)) %>%
     distinct(treat) %>%
     select(mean_t)

r <- filter(d, treat == "R") %>%
     mutate(mean_r = mean(logpk)) %>%
     distinct(treat) %>%
     select(mean_r)

# critbound value is different from sas output.
# 90% CI for sigma(T)/sigma(R) are same as sas output

tr <- merge(t, r)
final <- merge(sw, tr) %>%
         mutate(size = nrow(distinct(d, subj)),
                theta = (log(1/0.9)/0.10)**2,
                critbound = (mean_t - mean_r)**2 - theta * s2wr,  ## different from SAS.
                cilower = swt/swr/(sqrt(qf(0.05, df1 = size - 2, df2 = size -2, lower.tail = FALSE))),
                ciupper = swt/swr/(sqrt(qf(0.95, df1 = size - 2, df2 = size -2, lower.tail = FALSE))))

So to summarise, I can get sWR, sWT, 90% CI for variability comparison (Thanks to Detlew), but not 95% upper limit for (YT - YR)2 - θ * s2WR. I'm sure the error comes from the mean T and R. Any thought?

All the best,
Shuanghe

Complete thread:

UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,638 registered users;
79 visitors (0 registered, 79 guests [including 6 identified bots]).
Forum time: 16:57 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5