Shuanghe ★★ Spain, 2014-02-10 18:05 (4086 d 03:25 ago) Posting: # 12382 Views: 26,510 |
|
Hi all, This gonna be a really lengthy post with some ready-to-use SAS and R code for comparing dissolution profiles with MSD method based on Tsong et al's article. I had the impression that administrators don't like the lengthy post but at least I didn't find such prohibition from forum policy ![]() Over the years many, including myself, have asked the question in this forum of MSD method for compare dissolution profiles using SAS or other softwares. The most relevant thread I knew is this thread. There are some links for reference articles, SAS macro and DDSolver for Excel (and the mistrust of the last). There're 3 different approaches used in the SAS macro in Detlew's post but none of them was based on Tsong's article and the macro was not specifically designed for dissolution comparison. (By that I by no means saying Tsong et al's method is the best one, or even the better one. The article is mentioned by many is probably because the authors were all from FDA, so it has a kind of "official" status). While the rest of the calculation is fairly easy, obtaining the 90% CI of Mahalanobis distance by Lagrange multiplier method is pretty mysterious to people like me who lacks deep mathematical understanding and I believe it's the only obstacle for those who asked the "how to" questions. There is another SAS macro by Connolly et al in the following reference: "SAS IML Code To Calculate an Upper Confidence Limit for Multivariate Statistical Distance"*, which can be accessed via this address. This is a macro specifically written for comparing dissolution profiles. In fact, Tsong's article is the only reference cited at the end (though authors use Newton-Raphson iterations to find the upper confidence bound). The shortcoming is that the macro is lengthy, and while it might be suitable for model-independent MSD approach, it might not easy to modify it for model-dependent approach (comparing Weibull parameters for example). * If you'd like to play the macro please note that there are some error in the result section where 8 points of data were used. The macro was correct though. I used to be happy to use the modified version of Connolly et al's code by adding various macro parameters to allow comparing several batches of formulations together which the original macro cannot do. However, I recently come across another article "In vitro dissolution profile comparison"** also by Tsong et al in Enclyclopedia of Biopharmaceutical Statistics where the formulas were described for 90% CI of the Mahalanobis Distance and with the help of those formulas the obstacle of getting 90% CI of Mahalanobis distance disappears. In addition to help to simplify the SAS macro, it even allows a R-novice such as I to write R code to do the job. So, below are some ready-to-use SAS and R code in case anyone's interested. Data used was taken from Tsong's 1996 article so we can compare results. ** There are some printing error so the formulas listed there were incorrect. you have to delete the n/2 from below the square root. ========== SAS IML ========== /* firstly read data to create matrix for T & R with each column represent dissolution at time tn. so typically you'll have 12 rows and p columns matrices for T and R. The rest is step-by-step approach which can be very easily converted to a macro when you want. PROC IML need to be running */ /* for those has old version of SAS 9.2 and before you need to define a module for calculation of covariance. SAS 9.3 has internal fuctin COV */ START COV(m); /* nt and nr are number of unit, typically 12, Tsong's article only has 6 unit. p is number of time points, dg is mean % of difference in dissolution between each time points. typically 10. dg=15 was used to compare result with that of the article. it's better to define those number firstly to be easier to modify the code */ nt=6; nr=6; p=8; dg=15;
By adding various parameters to convert it in macro and define some global macro variables for titles you can get some pretty final results tables with intermediate results to check with output of DDSolver and the Connolly'macro. In the last month I had to do profile comparisons of > 100 batches of T and R from different projects and I used DDSolver, Connolly's macro, my macro based on steps above for each comparison. The output are same each time. I stronglly suspect that DDSolver use the above approach so same result doesn't really means anything but Connolly's Newton-Raphson search also give same result everytime. So Detlew, I know you have doubt about DDSolve but it seems it indeed works. ========== R Code ========== I googled R functions when I needed so hopefully I won't embarrass myself too much by the following naive R code. Since there are so many R gurus here I'm sure someone can make it as a simple function such Dm90CI or sth. like that. ![]() #dissolution data for T & R, format in columns t1 t2 t3 ... mt <- matrix( # same as above, firstly define those number together so it's easy to change nt <- 6 # mean dissolution profile and difference tbar <- colMeans(mt) #covariance of T & R and pooled covariance matrix and it's inverse st <- cov(mt) # Mahalanobis distance dm <- sqrt(t(mdiff) %*% spinv %*% mdiff) # k and fcrit as mentioned in the aerticle df <- nt + nr -p -1 # create column vector with each element equal to predefined limit dg mdg <- c(rep(dg, p)) # criteria dm_max <- sqrt(t(mdg) %*% spinv %*% mdg) #Hotelling's T^2. I added it just to compare with DDSolver H2 <- nt*nr/(nt+nr)*dm^2 #vector on the boundary of confidence region bound1 <- mdiff*(1+sqrt(fcrit/(k*t(mdiff) %*% spinv %*% mdiff))) # 90% CI of Mahalanobis distance ci1 <- sqrt(t(bound1) %*% spinv %*% bound1) if (dm_upper < dm_max) conclusion <- "Similar" else conclusion <- "Not similar" Again, it should be pretty easy from here to print a final table with result and intermediate result after that. Happy coding. — All the best, Shuanghe |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2014-02-10 18:15 (4086 d 03:15 ago) @ Shuanghe Posting: # 12384 Views: 20,758 |
|
Hi Shuanghe, ❝ This gonna be a really lengthy post […]. I had the impression that administrators don't like the lengthy post but at least I didn't find such prohibition from forum policy My own posts have a strong tendency to be verbose as well… The limit in the database is currently set to 15,000 characters. Yours has 10,131 including blanks. So there is still some head room. THX for the codes! Edit 2014-07-30: I expect something in the near future from EMA (in the Q&A?). See: CHMP, Agenda of the 21-24 July 2014 meeting, page 23: CMDh request to PKWP on acceptability of Mahalanobis test for proof of comparable dissolution (biowaiver) for additional strengths of a bio-batch strength — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
arjunroy ☆ India, 2014-12-19 11:56 (3774 d 09:34 ago) @ Shuanghe Posting: # 14117 Views: 20,121 |
|
Hi,
Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post! [Ohlbe] |
Shuanghe ★★ Spain, 2014-12-19 18:13 (3774 d 03:17 ago) @ arjunroy Posting: # 14119 Views: 20,650 |
|
Hi arjunroy, ❝ 1. R code is working fine, So I guess you have nt=nr otherwise the code will give wrong result. I noticed it some time ago so I'll modify the code when I have some time. ❝ though SAS IML v9.1.3 did not, got problem in calculating Cov matrix. Execution error noted in log despite it can be done manually. Do you add the COV function before the run? As I mentioned in the post, for SAS 9.2 and older version there's no COV function so you should add it manually. The example of COV I gave only work for no missing value which in most of the case it should be. But if not, well, you have to adapt the code. ❝ 2. Any idea how dm-max is calculated in a 2 parameter weibull function, I am unable to verify in DDSolver output. For all kind of data it sets dm_max=11.82 ❝ Any further insights will be appreciated. If I remember it correctly, that's a certain default value in DDSolver. You should use you own criteria instead of 11.82. I'm not aware any guideline that clearly define how to set similarity limit but in my opinion the criteria can be obtained by, e.g., fitting suitable model at least 2 different reference batches (more batches would be better) and determine the MSD between them (let's call it MSDR). If you have more batches, pair-wise compare the MSDR (A vs B, B vs C, A vs C, ...) and pick the biggest value. The rational is that the difference between test and reference (upper limit of MSD) should not be greater than the maximum difference observed between reference batches themselves. So the maximum value in your pair-wise comparison of difference between reference can be used as dm_max. — All the best, Shuanghe |
arjunroy ☆ India, 2014-12-22 06:42 (3771 d 14:48 ago) (edited on 2014-12-22 09:39) @ Shuanghe Posting: # 14135 Views: 19,872 |
|
❝ ❝ 1. R code is working fine, ❝ ❝ So I guess you have nt=nr otherwise the code will give wrong result. I noticed it some time ago so I'll modify the code when I have some time. Thank you ❝ ❝ though SAS IML v9.1.3 did not, got problem in calculating Cov matrix. Execution error noted in log despite it can be done manually. ❝ ❝ Do you add the COV function before the run? As I mentioned in the post, for SAS 9.2 and older version there's no COV function so you should add it manually. The example of COV I gave only work for no missing value which in most of the case it should be. But if not, well, you have to adapt the code. Here is SAS IMLv9.1.3 log error for your perusal. proc iml; Thank you so much. Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post! [Ohlbe] |
Shuanghe ★★ Spain, 2014-12-24 12:09 (3769 d 09:21 ago) @ arjunroy Posting: # 14159 Views: 19,614 |
|
Hi arjunroy, ❝ Thank you for pooled variance, instead of using (st + sr) / 2, it's better to use: sp = ((nt-1)*st + (nr-1)*sr)/(nt+nr-2) This works regardless if nt = nr. ❝ Here is SAS IMLv9.1.3 log error for your perusal. ❝ ...... I use SAS 9.2 and the COV module defined as above worked. It seems version 9.1.3 didn't support operation for matrix with different dimensions therefore the line d = m - m[:,]; gave error you had. Weird. You can try the following code: START COV(m); The REPEAT function will create a matrix with n rows with each row equal to the mean profile. Then the matrix will be of same dimensions for the subtraction to work. You can report it back here if it works. — All the best, Shuanghe |
arjunroy ☆ India, 2014-12-24 12:44 (3769 d 08:46 ago) @ Shuanghe Posting: # 14160 Views: 19,655 |
|
❝ You can report it back here if it works. Thank you, it works now. I used Repeat function. Hi SAS and R code give different results for the same data. For above said example: (dm_lower, dm_upper) dm_max SAS results: (19.65, 33.32) 29.14 R (1.66, 5.148) 4.43 Your insights will be appreciated. Best regards Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post! Two posts merged; you can edit the fisrt one for 24 hours. [Helmut] |
Shuanghe ★★ Spain, 2015-01-08 12:16 (3754 d 09:14 ago) @ arjunroy Posting: # 14253 Views: 19,403 |
|
Hi arjunroy, ❝ Thank you, it works now. I used Repeat function. Good to know. ❝ SAS and R code give different results for the same data. ❝ For above said example: (dm_lower, dm_upper) dm_max ❝ SAS results: (19.65, 33.32) 29.14 ❝ R (1.66, 5.148) 4.43 I've no idea how you get this. I did a quick check and both R and SAS give the same result as from your SAS output when dg = 15. Check if you have correct data in R. — All the best, Shuanghe |
wligtenberg ☆ The Netherlands, 2016-12-12 15:09 (3050 d 06:21 ago) @ Shuanghe Posting: # 16839 Views: 22,980 |
|
First of all thank you for this very useful post! I hope you are still active, or that other can help me out with this as well. I have tried to use your code and verify it against the article and so far I have been unable to get the same results. I think you have a typo in the definition of your mr. You have a value: 78.45, where in the article I think it is: 78.54. Interestingly, with this typo, you get exactly the same value as in the article on page 6. And then I started to see if some calculations might have been off, and I checked the mean values calculated with the mean values in the Tsong article. And mean values calculated in your script are off compared to the ones reported: tbar TEST MEAN 23.08 37.95 49.73 57.19 68.05 84.12 92.48 96.34 REFERENCE MEAN 46.77 60.21 67.28 71.97 78.05 84.82 89.09 91.43 Even if this could be related to rounding issues, which I think is impossible. Look at the following values: mean(mt[, 2]) = 39.61 versus 37.95 in the article.Even if I add 0.01 to all values (or subtract it) I don't get that value: mean(mt[, 2] + 0.01) Similar thing with the covariance matrices, if I use your data (with typo) I get results that are more similar (excluding some rounding errors) to the Tsong article, than without the typo. Which is uncanny, so maybe I missed a memo where this is a known typo, and that you actually used the correct data. (which however does not explain why the mean values are off, in the article) I just wanted to verify my implementation before I am going to put it into practice. I might even create a package out of this in the end, because I think it would be nice to have an R package with a couple of methods combined. |
Shuanghe ★★ Spain, 2016-12-19 20:03 (3043 d 01:27 ago) @ wligtenberg Posting: # 16858 Views: 15,777 |
|
Hi wligtenberg, ❝ I think you have a typo in the definition of your mr. You have a value: 78.45, where in the article I think it is: 78.54. Eagle eye! I never noticed it before. My copy of Tsong's article is old and is of image format so I copied the data from DDSolver (mentioned in my original post) instead to avoid manually entering the data. I double checked it after you'd pointed out the typo and it seems it is the only individual data point that's "wrong". ❝ And then I started to see if some calculations might have been off, and I checked the mean values calculated with the mean values in the Tsong article. ❝ ... To summarise, you are right. The individual data and the mean values in the article do not agree (4 time points in reference batch and 2 time points in test batch). ❝ Similar thing with the covariance matrices, if I use your data (with typo) I get results that are more similar (excluding some rounding errors) to the Tsong article, than without the typo. ❝ Which is uncanny, so maybe I missed a memo where this is a known typo, and that you actually used the correct data. It seems that I missed it too. I guess the only way to be sure is to ask Tsong to provide the original data. But I had a feeling that the authors of DDSolver might have done just that otherwise it's difficult to explain why the sample data generated by DDSolver give much better result with this "typo". So an alternative is to contact them. ❝ (which however does not explain why the mean values are off, in the article) My guess is that the individual data is the correct one (maybe except the unit 4 of reference at 30 min) otherwise we'd see much bigger difference in the variance-covariance table. The article was printed rather sloppily in my opinion. There are other mistakes too, such as reference to tables are all wrong, one time point is missing for T-R mean difference, data in the last table was never referenced... So I guess we should not be shocked about the discrepancy between individual and mean values. Actually, there some mistakes in the formula I mentioned in the original post (in the article in Encyclopedia of Biopharmaceutical Statistics by same authors) so i don't know if it's the authors or editors who were sloppy. ![]() ❝ I just wanted to verify my implementation before I am going to put it into practice. I believe that the method is correct but that's just my opinion. ❝ I might even create a package out of this in the end, because I think it would be nice to have an R package with a couple of methods combined. Good! Please let us know when it's available. Just to be thorough, I paste the roughly coded function (differ very slightly from code above) that I sometimes use. after read data from external file as data frame... msd <- function(test, ref, dg=10){ — All the best, Shuanghe |
wligtenberg ☆ The Netherlands, 2016-12-20 09:14 (3042 d 12:16 ago) (edited on 2016-12-20 10:43) @ Shuanghe Posting: # 16861 Views: 15,784 |
|
Thank you very much for your reply. It would indeed seem that the Tsong paper was not thoroughly checked. I also started verifying my code with DDsolver, since that is used quite often. And for most of my tests it went fine, except for one generated data set. All values are the same apart from the lower part of the confidence region. In DDsolver it is: -0.361857342251203 In my code (and yours) it is: 0.361857342251203 The rest is all the same, also the conclusion would be the same, but I really dislike the idea that part of the calculation might be wrong. Because (as you know) quite a large bit is the same for the upper and lower part of the confidence region. Here is my data set (made so you should be able to drop it in your code).
reference <- structure(c(28.7911578671327, 27.7063076528163, 28.2263135554565, If you have any idea why these results would differ, you would really help me out. I will see if I can figure out what happens in DDsolver. OK, I now know why these values are different. DDsolver basically calculates the lower and upper bounds as follows:
df <- nt + nr - p - 1 That is quite a bit simpler, is it also correct? |
Shuanghe ★★ Spain, 2016-12-22 10:23 (3040 d 11:07 ago) @ wligtenberg Posting: # 16873 Views: 15,612 |
|
Hi, ❝ OK, I now know why these values are different. ❝ DDsolver basically calculates the lower and upper bounds as follows: ❝ ❝ ❝ df <- nt + nr - p - 1 ❝ k <- nt * nr / (nt + nr) * df / ((nt + nr - 2) * p) ❝ fcrit <- qf(0.9, p, df) ❝ ❝ dm_lower <- dm - sqrt(fcrit/k) ❝ dm_upper <- dm + sqrt(fcrit/k) ❝ ❝ ❝ That is quite a bit simpler, is it also correct? DDSolver's source is locked, so I thought that, unless authors let you, no one can have a peek as how it works exactly. I have no idea how you figure this out but ![]() If they obtain low and upper limits as you said then I don't think it's correct. I'm not a statistician so my thinking might be naïve. For me, here the Mahalanobis distance is a measure of the difference in terms of "distance" and when there is no difference the distance would be zero; if there are difference, the distance would always be a positive number. Negative distance? ![]() — All the best, Shuanghe |
wligtenberg ☆ The Netherlands, 2016-12-22 12:20 (3040 d 09:10 ago) @ Shuanghe Posting: # 16874 Views: 15,604 |
|
❝ DDSolver's source is locked, so I thought that, unless authors let you, no one can have a peek as how it works exactly. I have no idea how you figure this out but I didn't know that they tried to limit people to look at the code. I just opened it in Libre Office (under linux) and I could see the code, nothing fancy. ❝ If they obtain low and upper limits as you said then I don't think it's correct. I'm not a statistician so my thinking might be naïve. For me, here the Mahalanobis distance is a measure of the difference in terms of "distance" and when there is no difference the distance would be zero; if there are difference, the distance would always be a positive number. Negative distance? I agree that the distance should never be negative, and that if your CR somehow goes below zero, it should be truncated to zero. However, the interesting thing is that for the rest of the data sets that I tried, that calculation comes up with the same values as your calculation. Also it seems that the CR is always symmetrical around the Mahalanobis distance. The DDsolver result fits in that observation, the results of the calculation with the R code does not in this case. So even though the negative value of the lower bound might be incorrect, and should then be 0, for the rest it seems to work well. ![]() |
Olga ☆ Belarus, 2018-04-21 09:01 (2555 d 13:29 ago) @ Shuanghe Posting: # 18712 Views: 11,813 |
|
Hi, Shuanghe, Thanks a lot for R code. It's very helpful! I have a question about max MRD. You calculate it from vector m_dg <- cbind(rep(dg, p)) where dg is same positive value (often 10% - max inter-batch difference). I wonder -why only positive value? I suppose, max inter-batch difference is absolute value, and may be positive or negative, or it is my mistake? I modeled vector m_dg like m_dg <- c(10,-10....) , of course, results were various.Thanks in advance! |