Bioequivalence and Bioavailability Forum • Model check in R

Model check in R [Software]

posted by Helmut – Vienna, Austria, 2008-06-05 17:12 (6181 d 02:16 ago) – Posting: # 1925
Views: 16,536

Dear Vishal!

❝ I got y=0.1314x+0.1217 (y= mx+c) equation and R² 0.9925 using excel sheet.

Hopefully it’s just a typo; I get 0.1317 for the slope.
R² is – almost – irrelevant.

❝ Now i want […] in excel sheet.

Have you read my last post – especially about validation?

❝ Is it possible in EXCEl?

There is no function for weighted regression integrated – stop searching… :sleeping:

Why do you want to stick with this awful piece of software? Just to give you an idea how your problem would look in R:


x         <- c( 1,   2,   4,   8,  10   ) # concentrations

y         <- c( 0.2, 0.4, 0.7, 1.2, 1.4 ) # responses

yu        <- c( 0.3, 0.45,0.71,1.3, 1.35) # unknown samples

plot(x, y)                                # simple plot

model1    <- lm(y ~ x)                    # unweighted regression

model2    <- lm(y ~ x, weights = 1/x)     # w = 1/x

model3    <- lm(y ~ x, weights = 1/x^2)   # w = 1/x^2

# results section #

summary(model1)                           # parameters

confint(model1, level = 0.95)             # confidence intervals

extractAIC(model1, k = 2)                 # AIC

fitted(model1)                            # fitted values

residuals(model1)                         # residuals

((y-coef(model1)[1])/coef(model1)[2]


  -x)/x*100                               # bias (back-calculated values)

xu        <- (yu-coef(model1)[1])/

             coef(model1)[2]              # concentration of samples

xu                                        # show results

abline(coef=coef(model1))                 # unweighted

abline(coef=coef(model2), col="blue")     # 1/x

abline(coef=coef(model3), col="red")      # 1/x²

In the first three lines input from external files are possible as well (plain text, Excel :-D

, any ODBC-application, a MySQL-database, and many others).
If you replace model1 in the results section with model2 and model3, respectively, you'll get results for w=1/x and w=1/x².
Output looks like this:


Call:

lm(formula = y ~ x)

Residuals:

       1        2        3        4        5 

-0.05333  0.01500  0.05167  0.02500 -0.03833 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)    

(Intercept) 0.121667   0.040127   3.032 0.056221 .  

x           0.131667   0.006597  19.959 0.000275 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0511 on 3 degrees of freedom

Multiple R-squared: 0.9925,     Adjusted R-squared:  0.99 

F-statistic: 398.4 on 1 and 3 DF,  p-value: 0.0002749

                   2.5 %    97.5 %

(Intercept) -0.006035715 0.2493690

x            0.110672524 0.1526608

[1]   2.00000 -28.29403

        1         2         3         4         5

0.2533333 0.3850000 0.6483333 1.1750000 1.4383333

          1           2           3           4           5

-0.05333333  0.01500000  0.05166667  0.02500000 -0.03833333

[1] -40.506329   5.696203   9.810127   2.373418  -2.911392

[1] 1.354430 2.493671 4.468354 8.949367 9.329114

If you want to go with a quadratic model, use


model4    <- lm(y ~ x + I(x^2))

model5    <- lm(y ~ x + I(x^2), weights = 1/x)

model6    <- lm(y ~ x + I(x^2), weights = 1/x^2)

and in the result section e.g.


if(coef(model4)[3]<0) {

  (-(coef(model4)[2]/2/coef(model4)[3]+

  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-

  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100

  xu      <- -(coef(model4)[2]/2/coef(model4)[3]+

             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-

             (coef(model4)[1]-yu)/coef(model4)[3]))

} else {

  (-(coef(model4)[2]/2/coef(model4)[3]-

  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-

  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100

  xu      <- -(coef(model4)[2]/2/coef(model4)[3]-

             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-

             (coef(model4)[1]-yu)/coef(model4)[3]))

}

There are two possible solutions for the square root depending on the sign of the quadratic parameter (the third coefficient); if the value is exactly zero, the model should reduce to the linear case. Since this sniplet is just quick and dirty I leave the homework to somebody else.

❝ If possible, which equation obtain using above data and How can i obtain this equation?

Even in Excel it’s possible to get weighted regressions running. For formulas see e.g. Draper/Smith (Applied Regression Analysis, Wiley, 3^rd ed, 1998) or Miller/Miller (Statistics and Chemometrics for Analytical Chemistry, Pearson, 5^th ed, 2005). Of course you will have to struggle with a spreadsheet which is not suitable for this kind of job (it’s like writing a letter instead by means of a word+processor in Adobe’s Photoshop).

BTW, a linear model is wrong for your example’s data.
You have to justify the model, which for your data is simply not linear at all. A plot of residuals (y_calc - y_obs) vs. x would have shown a nice parabola – where we expect a random arrangement independent from x. The residuals show a pattern [–+++–] of 3 runs (changes of sign). Back-calculated x are biased within the range -40.5% to +9.81%. Weighting improves this mess a little (1/x: -18.6%, +12.9%; 1/x²: -9.83%, +13.8%), but according to residual analyses it's the wrong model. Guidelines call for a justification, because if you apply a wrong model you will

bias your results, and
as a side-effect reject batches more often than necessary.

Just compare the linear with the quadratic model y=A+Bx+Cx². 4 runs (of a possible 5), no more trend in residuals, back-calculated x: unweighted (-6.07%, +4.46%), 1/x (-2.61%, +4.32%), 1/x² (-3.07%, +3.36%).
According to AIC (Akaike’s Information Criterion) and back-calculated x the best model is the quadratic one with w=1/x²: y=0.001814+0.2070x-0.006936x².

You can easily compare models by an F-test, e.g.


anova(model3, model6, test="F")

getting


Analysis of Variance Table

Model 1: y ~ x

Model 2: y ~ x + I(x^2)

  Res.Df        RSS Df  Sum of Sq      F  Pr(>F)  

1      3 0.00090656                               

2      2 0.00006905  1 0.00083751 24.258 0.03884 *

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

But even the unweighted quadratic is by far better than any weighting scheme applied to a linear model. Your example shows the importance of analyzing residuals – not only plotting x vs. y as you have stated in your second post.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Back calculated concentration with weighting factor vish14184 2008-06-04 07:05
- Back calculated concentration with weighting factor JPL 2008-06-04 10:53
  - Back calculated concentration with weighting factor vish14184 2008-06-04 11:36
    - weighting irrelevant in back-calculation Helmut 2008-06-04 12:57
      - weighting irrelevant in back-calculation vish14184 2008-06-05 08:57
        
        Model check in RHelmut 2008-06-05 15:12
- Back calculated concentration with weighting factor Jaime_R 2008-06-04 13:18