Model check in R [Software]

posted by Helmut Homepage – Vienna, Austria, 2008-06-05 17:12 (6581 d 14:39 ago) – Posting: # 1925
Views: 18,931

Dear Vishal!

❝ I got y=0.1314x+0.1217 (y= mx+c) equation and R2 0.9925 using excel sheet.

  1. Hopefully it’s just a typo; I get 0.1317 for the slope.
  2. R2 is – almost – irrelevant.

❝ Now i want […] in excel sheet.


Have you read my last post – especially about validation?

❝ Is it possible in EXCEl?

There is no function for weighted regression integrated – stop searching… :sleeping:
Why do you want to stick with this awful piece of software? Just to give you an idea how your problem would look in R:
x         <- c( 1,   2,   4,   8,  10   ) # concentrations
y         <- c( 0.2, 0.4, 0.7, 1.2, 1.4 ) # responses
yu        <- c( 0.3, 0.45,0.71,1.3, 1.35) # unknown samples
plot(x, y)                                # simple plot
model1    <- lm(y ~ x)                    # unweighted regression
model2    <- lm(y ~ x, weights = 1/x)     # w = 1/x
model3    <- lm(y ~ x, weights = 1/x^2)   # w = 1/x^2
# results section #
summary(model1)                           # parameters
confint(model1, level = 0.95)             # confidence intervals
extractAIC(model1, k = 2)                 # AIC
fitted(model1)                            # fitted values
residuals(model1)                         # residuals
((y-coef(model1)[1])/coef(model1)[2]
  -x)/x*100                               # bias (back-calculated values)
xu        <- (yu-coef(model1)[1])/
             coef(model1)[2]              # concentration of samples
xu                                        # show results
abline(coef=coef(model1))                 # unweighted
abline(coef=coef(model2), col="blue")     # 1/x
abline(coef=coef(model3), col="red")      # 1/x²


In the first three lines input from external files are possible as well (plain text, Excel :-D, any ODBC-application, a MySQL-database, and many others).
If you replace model1 in the results section with model2 and model3, respectively, you'll get results for w=1/x and w=1/x2.
Output looks like this:
Call:
lm(formula = y ~ x)
Residuals:
       1        2        3        4        5
-0.05333  0.01500  0.05167  0.02500 -0.03833
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 0.121667   0.040127   3.032 0.056221 . 
x           0.131667   0.006597  19.959 0.000275 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0511 on 3 degrees of freedom
Multiple R-squared: 0.9925,     Adjusted R-squared:  0.99
F-statistic: 398.4 on 1 and 3 DF,  p-value: 0.0002749
                   2.5 %    97.5 %
(Intercept) -0.006035715 0.2493690
x            0.110672524 0.1526608
[1]   2.00000 -28.29403
        1         2         3         4         5
0.2533333 0.3850000 0.6483333 1.1750000 1.4383333
          1           2           3           4           5
-0.05333333  0.01500000  0.05166667  0.02500000 -0.03833333
[1] -40.506329   5.696203   9.810127   2.373418  -2.911392
[1] 1.354430 2.493671 4.468354 8.949367 9.329114


If you want to go with a quadratic model, use
model4    <- lm(y ~ x + I(x^2))
model5    <- lm(y ~ x + I(x^2), weights = 1/x)
model6    <- lm(y ~ x + I(x^2), weights = 1/x^2)

and in the result section e.g.
if(coef(model4)[3]<0) {
  (-(coef(model4)[2]/2/coef(model4)[3]+
  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100
  xu      <- -(coef(model4)[2]/2/coef(model4)[3]+
             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
             (coef(model4)[1]-yu)/coef(model4)[3]))
} else {
  (-(coef(model4)[2]/2/coef(model4)[3]-
  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100
  xu      <- -(coef(model4)[2]/2/coef(model4)[3]-
             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
             (coef(model4)[1]-yu)/coef(model4)[3]))
}

There are two possible solutions for the square root depending on the sign of the quadratic parameter (the third coefficient); if the value is exactly zero, the model should reduce to the linear case. Since this sniplet is just quick and dirty I leave the homework to somebody else.

❝ If possible, which equation obtain using above data and How can i obtain this equation?


Even in Excel it’s possible to get weighted regressions running. For formulas see e.g. Draper/Smith (Applied Regression Analysis, Wiley, 3rd ed, 1998) or Miller/Miller (Statistics and Chemometrics for Analytical Chemistry, Pearson, 5th ed, 2005). Of course you will have to struggle with a spreadsheet which is not suitable for this kind of job (it’s like writing a letter instead by means of a word+processor in Adobe’s Photoshop).

BTW, a linear model is wrong for your example’s data.
You have to justify the model, which for your data is simply not linear at all. A plot of residuals (ycalc - yobs) vs. x would have shown a nice parabola – where we expect a random arrangement independent from x. The residuals show a pattern [–+++–] of 3 runs (changes of sign). Back-calculated x are biased within the range -40.5% to +9.81%. Weighting improves this mess a little (1/x: -18.6%, +12.9%; 1/x2: -9.83%, +13.8%), but according to residual analyses it's the wrong model. Guidelines call for a justification, because if you apply a wrong model you willJust compare the linear with the quadratic model y=A+Bx+Cx2. 4 runs (of a possible 5), no more trend in residuals, back-calculated x: unweighted (-6.07%, +4.46%), 1/x (-2.61%, +4.32%), 1/x2 (-3.07%, +3.36%).
According to AIC (Akaike’s Information Criterion) and back-calculated x the best model is the quadratic one with w=1/x2: y=0.001814+0.2070x-0.006936x2.

You can easily compare models by an F-test, e.g.
anova(model3, model6, test="F")
getting
Analysis of Variance Table
Model 1: y ~ x
Model 2: y ~ x + I(x^2)
  Res.Df        RSS Df  Sum of Sq      F  Pr(>F) 
1      3 0.00090656                               
2      2 0.00006905  1 0.00083751 24.258 0.03884 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


But even the unweighted quadratic is by far better than any weighting scheme applied to a linear model. Your example shows the importance of analyzing residuals – not only plotting x vs. y as you have stated in your second post.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,653 posts in 4,991 threads, 1,570 registered users;
108 visitors (0 registered, 108 guests [including 31 identified bots]).
Forum time: 07:52 CEST (Europe/Vienna)

To propose that poor design can be corrected by subtle analysis techniques
is contrary to good scientific thinking.    Stuart J. Pocock

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5