ElMaestro
★★★

Denmark,
2019-07-18 11:43
(597 d 12:21 ago)

Posting: # 20388
Views: 7,614
 

 Spreadsheet failures, any recent examples? [Software]

Hi all,

does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

I am not talking about for example the cases of standard deviations or variances that went funky when the application was fed extreme data ranges, or similar waterboarding of software, which was debated 10 years ago and which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.

Realistic.
Data.
Only.
Please.

I will leave it to the reader to quantitatively define what "realistic" and "recent" actually mean and I reserve the right to challenge the definitions, obviously :-D

Of course I ask because I never thought spreadsheets were or are as bad as their reputation. The users, on the other hand, are every bit as bad as their reputation. This certainly includes myself. Most likely I am the worst. But that will be the topic of another thread :-)

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2019-07-18 12:32
(597 d 11:32 ago)

@ ElMaestro
Posting: # 20389
Views: 7,055
 

 Spreadsheet failures, any recent examples?

Dear ElMaestro,

» does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

Nope. I have examples of users using the wrong functions, or unlocked spreadsheets getting modified at some point, but that's about all. The problem is most often somewhere between the chair and the keyboard: wetware rather than software.

Well, there are some "features" of Excel you need to be aware of when using it for stats, of course. Already mentioned on the forum.

» [...] which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.

Maybe their coding skills are not any better than mine*, and they consider that they will always trust whatever they do with a spreadsheet more than anything they might obtain with R ? :-D

* sharing with the forum some experience already discussed by email with ElMaestro a couple of years ago: I installed R on my home computer. With the help of some "R for R-seholes" tutorials, it took me something like 4 hours to manage to open a simple set of data and make a simple plot out of it :PCchaos:. I spent a few more hours trying to have the colour of the plot change based on simple conditions (red if under a threshold value, green above it). I was even waking up at night with ideas on new ways to test. I never succeeded. Never used R again. I would never trust any of my results anyway.:lamer:

Regards
Ohlbe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-18 14:12
(597 d 09:52 ago)

@ Ohlbe
Posting: # 20390
Views: 7,066
 

 Spreadsheet addiction

Dear Ohlbe

» » [...] which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.
»
» Maybe their coding skills are not any better than mine*, and they consider that they will always trust whatever they do with a spreadsheet more than anything they might obtain with R ? :-D

There is a series of papers about indirect adjusted comparisons in BE published by regulators. All done in Excel.

» * […] I spent a few more hours trying to have the colour of the plot change based on simple conditions (red if under a threshold value, green above it). I was even waking up at night with ideas on new ways to test. I never succeeded.

I believe it. [image] is a nasty beast like SAS (© Detlew). Waking up in the middle of the night or – worse – not being able to fall asleep at all is a common side-effect. Since you will never use [image] again, skip this code:

op  <- par(no.readonly = TRUE)    # Safe original graphics parameters
par(pty = "s")                    # I want a square plotting region
x   <- runif(n=50, min=1, max=20) # Sample from the uniform distribution
a   <- 0.5                        # Intercept
b   <- 2                          # Slope
y   <- a + b * x + rnorm(n=length(x), mean=0, sd=2) # Response + random error
th  <- 10                         # Threshold
plot(x, y, type = "n", las = 1)   # Important: type="n", will add points later
grid()                            # Nice to have one
abline(h = th, col = "red")       # Line for threshold
abline(lm(y ~ x), lwd = 2, col = "blue")    # Linear regression
points(x[y < th], y[y < th], pch = 21, cex = 1.5,
       col = "red", bg = "#FFD700AA")       # Below threshold
points(x[y >= th]y[y >= th], pch = 21, cex = 1.5,
       col = "darkgreen", bg = "#32CD32AA") # At least threshold
par(op)                           # Restore original graphics parameters


More fancy stuff wanted?

library(shape)
op  <- par(no.readonly = TRUE)
par(pty = "s")
x   <- runif(n = 50, min = 1, max = 20)
a   <- 0.5
b   <- 2
y   <- a + b * x + rnorm(n=length(x), mean=0, sd=2)
th  <- 10
xlim <- c(0, 21)
ylim <- c(0, 42)
plot(x, y, type = "n", las = 1, xlim = xlim, ylim = ylim)
below <- y[y < th]
above <- y[y >= th]
clr.below <- colorRampPalette(c("#880000", "red"))(length(below))
clr.below <- paste0(clr.below, "80")
clr.above <- colorRampPalette(c("#004000", "#00BB00"))(length(above))
clr.above <- paste0(clr.above, "80")
df.below  <- data.frame(x=x[y < th], y = below, orig = 1:length(below))
df.below  <- df.below[order(df.below$y, decreasing = TRUE), ]
df.below$rank <- length(below):1
df.below$clr <- clr.below[df.below$rank]
df.above <- data.frame(x=x[y >= th], y=above, orig=1:length(above))
df.above <- df.above[order(df.above$y, decreasing = TRUE), ]
df.above$rank <- length(above):1
df.above$clr <- clr.above[df.above$rank]
filledrectangle(wx=30, wy = th, mid = c(10, th/2),
                col = colorRampPalette(c("#FFCCCC", "#FFEEEE"))(256))
grid()
abline(lm(y ~ x), lwd = 2, col = "blue")
abline(h = th, col = "red")
box()
for (j in 1:nrow(df.below)) {
  points(x = df.below[df.below$rank == j, ]$x,
         y = df.below[df.below$rank == j, ]$y,
         pch = 21, cex = 1.5,
         col = df.below[df.below$rank == j, ]$clr,
         bg = df.below[df.below$rank == j, ]$clr)
}
for (j in 1:nrow(df.above)) {
  points(x = df.above[df.above$rank == j, ]$x,
         y = df.above[df.above$rank == j, ]$y,
         pch = 21, cex = 1.5,
         col = df.above[df.above$rank == j, ]$clr,
         bg = df.above[df.above$rank == j, ]$clr)
}
par(op)


[image]

Semitransparent colors in Excel? I don’t think that’s possible. ;-)

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Ohlbe
★★★

France,
2019-07-18 15:30
(597 d 08:34 ago)

@ Helmut
Posting: # 20391
Views: 7,018
 

 Nasty beast

Dear Helmut,

» Since you will never use [image] again, skip this code

Thx... I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints.

Regards
Ohlbe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-18 18:26
(597 d 05:38 ago)

@ Ohlbe
Posting: # 20396
Views: 6,964
 

 Nasty beast

Dear Ohlbe,

» I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints.

Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors – as you experienced – and can be slooow. If you have a vector of data (say y) there a various ways to access its values. Example with 25 random integers in the range 1…40:

n  <- 25
y  <- round(runif(n = n, min = 1, max = 40))
th <- 10             # threshold
n  <- length(y)      # we know but useful later
head(y, 10)          # the first 10
y[1:10]              # same
tail(y, 10)          # the last 10
y[(n - 9):n]         # same but tricky
blq <- which(y < th) # the ones are below the threshold
length(blq)          # how many?
y[blq]               # show them
y[y < th]            # same

In such a simple case which() is an overkill, though one can include many conditions which [sic] makes the code more easy to understand. If you have two vectors (say x, y) like in my last post, you can select values of x depending on which(y = condition).
Let’s check:

n    <- 50
x    <- runif(n = n, min = 1, max = 20)
a    <- 0.5
b    <- 2
y    <- a + b * x + rnorm(n = length(x),  mean = 0, sd = 2)
th   <- 10

fun1 <- function(x, y, th) { # clumsy
  x.th <- numeric()
  y.th <- numeric()
  bql  <- 0L
  for (k in seq_along(x)) {
    if (y[k] < th) {
      bql       <- bql + 1
      x.th[bql] <- x[k]
      y.th[bql] <- y[k]
    }
  }
  z <- data.frame(x.th, y.th)
  return(invisible(z))
}

fun2 <- function(x, y, th) { # better
  blq  <- which(y < th)
  x.th <- x[blq]
  y.th <- y[blq]
  z    <- data.frame(x.th, y.th)
  return(invisible(z))
}

fun3 <- function(x, y, th) { # a little bit confusing for beginners
  x.th <- x[y < th]
  y.th <- y[y < th]
  z    <- data.frame(x.th, y.th)
  return(invisible(z))
}

res1 <- fun1(x, y, th)
res2 <- fun2(x, y, th)
res3 <- fun3(x, y, th)
identical(res1, res2); identical(res1, res3) # same?
[1] TRUE
[1] TRUE

Bingo! Which one is easier to read?
What about speed?

library(microbenchmark)
res <- microbenchmark(fun1(x, y, th),
                      fun2(x, y, th),
                      fun3(x, y, th), times = 1000L)
print(res, signif = 4)
Unit: microseconds
           expr   min    lq  mean median    uq  max neval cld
 fun1(x, y, th) 173.9 181.1 196.1  186.3 189.9 1530  1000   b
 fun2(x, y, th) 163.0 170.3 183.6  175.4 179.0 3311  1000  a
 fun3(x, y, th) 163.0 169.0 185.5  174.5 178.4 3310  1000  ab

Practically the same, since vectors are short. If we play this game with longer vectors fun2() shows its strengths.

[image]


If one has more conditions any external construct will suck. We have 100 random integers (1…50) and want to get the even ones between 20 and 30 in increasing order.

x <- round(runif(n = 100, min = 1, max = 50))
x
  [1] 37 29  8 34  6 13 47 22 30 44  6 14 33 18 12 37 32
 [18] 10  6 37 27  2 43 40  7  5 47  4 32 17  7 50 39 36
 [35] 38 48 34  5 21 43 34 50 29 20 33  6 45 32 28  8  1
 [52] 26 29 19 42  9 38 31 25  4  1 23 37 31  2 26 29 24
 [69] 40 43 17 16 41 17  5 17 36 16  7  5 36 30  5  8 19
 [86] 40 42 30 33 21 13 25 21 33 16  7 33 36 19 37

One-liner:

sort(x[which(x >= 20 & x <= 30 & x %%2 == 0)])
[1] 20 22 24 26 26 28 30 30 30

Or even shorter:

sort(x[x >= 20 & x <= 30 & x %%2 == 0])
[1] 20 22 24 26 26 28 30 30 30

Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Ohlbe
★★★

France,
2019-07-19 09:32
(596 d 14:32 ago)

@ Helmut
Posting: # 20397
Views: 6,900
 

 Nasty beast

Dear Helmut,

» Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors

» Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }

I think that's really what pissed me off. I thought something like if... else... would be very logical and easy. Maybe because these are real words from a language I can understand. But it looks like R and I are following different kinds of logics.

When you learn a new language and start speaking it with somebody, that person will usually show some goodwill, ignore grammatical mistakes and try and understand what you mean. R made no efforts of any kind, even though what I was trying to achieve was pretty obvious :-D

» If you have a vector of data (say y)

Yeah, ElMaestro started using this kind of vocabulary too when we were exchanging email. That got me even more confused. All I learnt at school about vectors is the geometric side of the concept - what Wikipedia apparently calls Euclidian vector. I had some problems understanding what an arrow between two points had to do with what I was trying to achieve.
Sorry ElMaestro, we're going quite far from your original question... :offtopic!:

Regards
Ohlbe
ElMaestro
★★★

Denmark,
2019-07-19 10:32
(596 d 13:32 ago)

@ Ohlbe
Posting: # 20399
Views: 6,904
 

 Nasty beast

Hi Ohlbe,
» Sorry ElMaestro, we're going quite far from your original question... :offtopic!:

That's OK.
The occasional drift on this forum always pleases me.


As for the logic, it is absolutely !7==!3 what you are saying, programming involves that kind of logic which is entirely bound to ones and zeros and that can be hard to grasp when the reference is our everyday language.

If it is any comfort, I can confess I usually can't read Hötzi's code right away. I usually use a lot of time deciphering it, or giving up. This is because syntactically it is condensed. I tend to use much longer lines and bigger functions to achieve the same goals. I can't maintain code otherwise, but this is just me.

Vector: Something that carries something, according to Birdsong's Internally Hosted Vocabulary (the platform is walnut-sized).
The object of this carriage may be information, like "two steps in this direction and then five steps in the perpendicular direction". Or "Apple and thereafter Syphilis".
The origin of the word is from the Illiad by Homer, song 13, in which Thetis says: "Vector bears it in triumph upon his own shoulders.", possibly expressed as such because he was in love?! Or at least this is my belief since it was said in relation to some amour.

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2019-07-19 11:38
(596 d 12:26 ago)

@ ElMaestro
Posting: # 20402
Views: 6,874
 

 Decidedly off topic

Hi ElMaestro,

» The origin of the word is from the Illiad by Homer, song 13, in which Thetis says: "Vector bears it in triumph upon his own shoulders.", possibly expressed as such because he was in love?! Or at least this is my belief since it was said in relation to some amour.

:lol2:

But if I understand the text well, this is only applicable to Trojan virus coding...

Regards
Ohlbe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-19 11:00
(596 d 13:04 ago)

@ Ohlbe
Posting: # 20401
Views: 6,854
 

 OT: R limbo 101

[image]Dear Ohlbe,

» I thought something like if... else... would be very logical and easy. Maybe because these are real words from a language I can understand.

Perfectly understandable. Of course, these constructs exist in R and we use them all the time. The point is only that direct access of values (with a condition) is simply more efficient.
If you don’t have to deal with large data sets, both are fine. Since I deal a lot with simulations (1 mio BE studies with a sometimes high number of subjects / periods / sequences, …) I learned the hard way to avoid the ‘simple’ constructs if ever possible (given, sometimes I know that it could be done but was too stupid).
Think about my previous example. For 1 mio calls the direct access is 15times faster than the loop and if() within. It’s not unusual to have nested calls. For two it would by ~230times slower and for three already ~3,500times. Would test your patience.

» But it looks like R and I are following different kinds of logics.

Not necessarily so. See above.

» When you learn a new language and start speaking it with somebody, that person will usually show some goodwill, ignore grammatical mistakes and try and understand what you mean. R made no efforts of any kind, even though what I was trying to achieve was pretty obvious :-D

I agree. See the subject line. The man-pages of [image]’s functions were written by statisticians, regularly forgetting non-expert users. Sometimes the response you get is bizarre. Want to know how for() works?

?for
+

+
Fuck, get me out of here! esc
help(for)
Error: unexpected ')' in "help(for)"

What the heck? Shall I really try it without the closing bracket? Strange. OK, OK.
help(for
+

Aha, the closing bracket is missing!
)
Error: unexpected ')' in:
"
)"

You can’t be serious! Google, google, reading the R-Inferno Section 8.2.30. Oh my!
?`for`
help(`for`)

Both work as expected. I don’t know what’s the logic behind. Beyond me.


» » If you have a vector of data (say y)
»
» Yeah, ElMaestro started using this kind of vocabulary too when we were exchanging email. […] All I learnt at school about vectors is the geometric side of the concept - what Wikipedia apparently calls Euclidian vector.

Correct.

» I had some problems understanding what an arrow between two points had to do with what I was trying to achieve.

Absolutely nothing, of course. In [image] a [image] one-dimensional array is meant. The indexing (i.e., where it starts) differs between languages.
  • 0
    C, C++, BASIC-dialects, Pascal (default; can be declared as 1), PHP, Perl, Java, JavaScript, Lisp, Python, Ruby, Scheme, …
  • 1
    S, R, Julia, FORTRAN, ALGOL, AWK, COBOL, Matlab, Mathematica, Wolfram Language, XQuery, …
In many languages there are three types of data structures: scalar (a single number, character, string, boolean), one- (R: vector) and multidimensional array (R: data.frame and matrix). In R there are no scalars, only vectors containing a single element instead.

x1   <- 1
x2   <- 1L
x3   <- TRUE
x4   <- NA
x5   <- "a"
comp <- matrix(data = c(c(x1, x2, x3, x4, x5),
                        c(is.vector(x1), is.vector(x2), is.vector(x3),
                          is.vector(x4), is.vector(x5)),
                        c(length(x1), length(x2), length(x3),
                          length(x4), length(x5)),
                        c(typeof(x1), typeof(x2), typeof(x3),
                          typeof(x4),typeof(x5))),
               nrow = 5, ncol = 4,
               dimnames = list(paste0("x", 1:5),
                               c("value", "vector?", "length", "type")))
print(as.data.frame(comp))

   value vector? length      type
x1     1    TRUE      1    double
x2     1    TRUE      1   integer
x3  TRUE    TRUE      1   logical
x4  <NA>    TRUE      1   logical
x5     a    TRUE      1 character


What makes [image] so powerful is yet another data type, namely the list. A list can contain any of the other types and a mixture of them (even other lists…).

char.vect <- letters[1:2]
int.vect  <- as.integer(1:3)
mydf      <- data.frame(x = 1:4, y = c(4:6, NA))
mymat     <- matrix(data = c(100:102, 2*100:102), nrow = 3, ncol = 2,
                    dimnames = list(paste0("row", 1:3), c("x", "y")))
mylist    <- list(char.vect = char.vect, int.vect = int.vect,
                  mydf = mydf, mymat = mymat)
print(mylist)

$char.vect
[1] "a" "b"

$int.vect
[1] 1 2 3

$mydf
  x  y
1 1  4
2 2  5
3 3  6
4 4 NA

$mymat
       x   y
row1 100 200
row2 101 202
row3 102 204


Elements can be accessed by their name or index.

mylist$mymat["row2", "y"]
[1] 202
mylist$mymat[2, 2]
[1] 202
mylist[[4]][2, 2]
[1] 202


The double square brackets are mandatory to access the root element of lists. Hence, mylist[4][2, 2] does not work:
Error in mylist[4][2, 2] : incorrect number of dimensions

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Shuanghe
★★  

Spain,
2019-07-18 16:59
(597 d 07:06 ago)

@ Helmut
Posting: # 20392
Views: 7,040
 

 Spreadsheet addiction

Dear all

» There is a series of papers about indirect adjusted comparisons in BE published by regulators. All done in Excel.

As far as I can tell, Excel and R give the same result. I was interested in the method so several weeks back I played with R using data from EPAR (none of the articles gives sample size of the BE study so without going to the EPAR for this additional information you can not reproduce/validate their results at all) and the difference in some comparison is 0.01% due to different rounding method used in Excel and R.

However, as Ohlbe said, there are some errors in the articles such as in the latest article (link), table 2 gives wrong data (I compared the original data from EPAR and it's clearly a mistake in article) but the result of the comparison using that data is correct, so this might just a typo. However, some comparisons were reversed, e.g., Table 3 describs generic 2 vs generic 4, and generic 3 vs generic 4, but in fact it were generic 4 vs 3 and 2, ... Apparently, someone typed the data in the wrong cell in excel so confidence interval were all reversed. :-D

» Semitransparent colors in Excel? I don’t think that’s possible. ;-)

Haha, this is where you err... :-D Actually you can do that in Excel (not that you should). At least I can in my 2013. It's ugly, but semi-transparent it is. :cool:

All the best,
Shuanghe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-18 18:04
(597 d 06:01 ago)

@ Shuanghe
Posting: # 20395
Views: 7,018
 

 Spreadsheet addiction

Hi Shuanghe,

» I was interested in the method so several weeks back I played with R using data from EPAR (none of the articles gives sample size of the BE study so without going to the EPAR for this additional information you can not reproduce/validate their results at all) …

Yep. I wrote a package for this stuff. I think they were using Excel cause (a) everybody has it (b) of its flat learning curve and (c) – most important – Luther Gwaza* gave his “template” to the others.

» … and the difference in some comparison is 0.01% due to different rounding method used in Excel and R.

If you want to reproduce Excel’s wacky “commercial rounding”:

cround <- function(x, d) ((trunc(abs(x) * 10^d + 0.5)) / 10^d) * sign(x)
x      <- c(79.994, 79.995, 125.004, 125.005)
comp   <- data.frame(x = x, R = round(x, 2), Excel = cround(x, 2))
print(comp, row.names = FALSE)
       x      R  Excel
  79.994  79.99  79.99
  79.995  80.00  80.00
 125.004 125.00 125.00
 125.005 125.00
125.01


» » Semitransparent colors in Excel? I don’t think that’s possible. ;-)
»
» Haha, this is where you err... :-D Actually you can do that in Excel (not that you should). At least I can in my 2013. It's ugly, but semi-transparent it is. :cool:

Well, my version is Excel 2000. BTW, navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel to have the color dependent on the value (the lower, the darker)? :smoke:


  • Gwaza L. Adjusted Indirect Treatment Comparisons of Bioequivalence Studies. PhD Thesis: Utrecht. 2016. [image] free resource.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Shuanghe
★★  

Spain,
2019-07-19 10:41
(596 d 13:23 ago)

@ Helmut
Posting: # 20400
Views: 6,873
 

 OT: Spreadsheet addiction

Hi Helmut,

» Yep. I wrote a package for this stuff.

Well, you're always several steps ahead of me. I just wrote some lousy function that serves the educational purpose (for myself). My output is for 4 methods only (Z-distribution, homo- and hetero-scedastic and pragmatic) since I'm still struggling with Chow and Shao/Liu's methods. By the way, I didn't see the package in your Github page. Any plan to put it there so we can have a peek? :-D

» If you want to reproduce Excel’s wacky “commercial rounding”:
» ...

I doubt I'll very use it but nice to know that it can be done. ;-)

» Well, my version is Excel 2000. BTW, navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel to have the color dependent on the value (the lower, the darker)? :smoke:

No way. Now you just want to piss off those Excel lovers :lol3:

»
  • Gwaza L. Adjusted Indirect Treatment Comparisons of Bioequivalence Studies. PhD Thesis: Utrecht. 2016. [image] free resource.

Thanks. I have many articles but I didn't know about the thesis. I'll take a look at it later.

ElMaestro, apology for the off-topic posting.

All the best,
Shuanghe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-19 12:29
(596 d 11:35 ago)

@ Shuanghe
Posting: # 20403
Views: 6,859
 

 OT: Spreadsheet addiction

Hi Shuanghe,

» » Yep. I wrote a package for this stuff.
» Well, you're always several steps ahead of me.

Nope. I started on July 11th.

» My output is for 4 methods only (Z-distribution, homo- and hetero-scedastic and pragmatic) …

Mine only for homo- and heteroscedastic variances (the latter is the package’s default). I think that relying on large sample approximations doesn’t make sense at all. The CI will always be narrower than with the other approaches but why would one go this way? Approximations of degrees of freedom for unequal variances were published in 1946/47 (Satterthwaite, Welch), the t-distribution in 1908 (Gosset a.k.a. Student). Going back more than two than two centuries (Laplace, Poisson, Gauß)? I don’t get it. BTW, I don’t like the pragmatic method.

» … since I'm still struggling with Chow and Shao/Liu's methods.

You need the raw data, right?

» […] I didn't see the package in your Github page. Any plan to put it there so we can have a peek? :-D

Paid job – chained by a CDA. If we will publish sumfink, sure. Not my decision.

» » If you want to reproduce Excel’s wacky “commercial rounding”:
» » ...
» I doubt I'll very use it but nice to know that it can be done. ;-)

I would never ever use it. However, if you want to reproduce sumfink which was rounded in Excel you can only experiment till you get the same in R.

» » […] navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel?
»
» No way. Now you just want to piss off those Excel lovers :lol3:

Yessir! R is a language and environment for statistical computing and graphics.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
nobody
nothing

2019-07-19 13:53
(596 d 10:12 ago)

@ Helmut
Posting: # 20404
Views: 6,816
 

 OT: Spreadsheet addiction

....reaaaaalllly boring in Vienna these days, huh? :-D

Kindest regards, nobody
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-19 17:32
(596 d 06:32 ago)

@ nobody
Posting: # 20406
Views: 6,764
 

 OT: Spreadsheet addiction

Hi nobody,

» ....reaaaaalllly boring in Vienna these days, huh? :-D

OT about [image] is a welcomed excuse.

   All paid jobs
absorb and degrade the mind.
     Aristotle

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
zizou
★    

Plzeň, Czech Republic,
2019-07-19 22:18
(596 d 01:46 ago)

@ Ohlbe
Posting: # 20408
Views: 6,746
 

 Spreadsheet etc. failures

Hello everybody and nobody.

» Nope. I have examples of users using the wrong functions, or unlocked spreadsheets getting modified at some point, but that's about all. The problem is most often somewhere between the chair and the keyboard: wetware rather than software.
»
» Well, there are some "features" of Excel you need to be aware of when using it for stats, of course. Already mentioned on the forum.
Right.

Recently, because of "floating point error" I failed to use spreadsheet for comparisons of differences of two similar values against certain value (my source data were almost all with two decimal places, but some of them were calculated with "full precision" and I wanted to easily check how many of differences are lower than 0.05, unfortunaly there was a case with the error).

See following examples:

      |          |   Default View    | Number Format with
 Cell | Formula  | in General Format | Additional Decimal Places
──────┼──────────┼───────────────────┼───────────────────────────
  A1  | 25.05    |                   |
  A2  | 25       |                   |
  A3  | =A1-A2   |       0.05        | 0.0500000000000007
  A4  | =A3=0.05 |       FALSE       |
  A5  | =A3>0.05 |       TRUE        |
Difference is higher than expected.

      |          |   Default View    | Number Format with
 Cell | Formula  | in General Format | Additional Decimal Places
──────┼──────────┼───────────────────┼───────────────────────────
  A1  | 5.05     |                   |
  A2  | 5        |                   |
  A3  | =A1-A2   |       0.05        | 0.0499999999999998
  A4  | =A3=0.05 |       FALSE       |
  A5  | =A3<0.05 |       TRUE        |
Difference is lower than expected.



Nevertheless it is not problem of spreadsheet itself, but only precision issue. In R:

a=5.05
b=5
a-b
#[1] 0.05
a-b==0.05
#[1] FALSE
a-b<0.05
#[1] TRUE


Everyone knows, only smart machines don't.

Best regards,
zizou
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-20 00:12
(595 d 23:52 ago)

@ zizou
Posting: # 20409
Views: 6,732
 

 As designed ☺

Hi zizou,

» Recently, because of "floating point error" I failed to use spreadsheet for comparisons of differences of two similar values against certain value (my source data were almost all with two decimal places, but some of them were calculated with "full precision" and I wanted to easily check how many of differences are lower than 0.05, unfortunaly there was a case with the error).

Haha, I know this game and therefore, always round before the comparison. ;-)

» Everyone knows, only smart machines don't.

Can’t be blamed. That’s a property of [image] floating point arithmetic (see also [image] IEEE 754). Unless you have powers of 2 (dec→bin: 1→1, 2→10, 4→100, 8→1000, …) you will always have precision issues in the last significant digits. Everything beyond the 15th (dec-) significant figure is just noise – even with 64bit software on a 64bit OS.
Try this in Excel:
Cells A1–A9: 0, 1, … , 8
Cells B1–B9: =2^A1–A9
Cells C2–B9: =B2-B1, B3-B1, …
Format to 16 decimal places. What do you get?

Little bit deeper into [image]:

a <- 5.05
b <- 5
c <- a - b
identical(c, 0.05)
[1] FALSE
sprintf("%.17f", c)
[1] "0.04999999999999982"
a <- 4
b <- 2
c <- a - b
identical(c, 2)
[1] TRUE
sprintf("%.17f", c)
"2.00000000000000000"
identical(c, 2L)
[1] FALSE
is.integer(c)
[1] FALSE
a <- 4L # or as.integer(4)
b <- 2L # or as.integer(2)
c <- a - b
identical(c, 2)
[1] FALSE
sprintf("%.17f", c)
"2.00000000000000000"
identical(c, 2L)
[1] TRUE


GNU Octave 5.1.1.0 (all calculations internally in 64bit double):

>> format long
>> a = 5.05
a = 5.050000000000000
>> b = 5
b = 5
>> b - a
ans = -4.999999999999982e-02
>> a = 4
a = 4
>> b = 2
b = 2
>> a - b
ans = 2


Not even software for symbolic mathematics does better. Maxima 5.35.12.1 / wxMaxima 14.12.1:

(%i1) [a:5.05, b:0.5, a-b];
(%o1) [5.05,5,0.04999999999999982]
(%i2) [a:4, b:2, a-b];
(%o2) [4,2,2]


Only for ne[image]ds: Rmpfr. Doesn’t help. ;-)

library(Rmpfr)
a <- mpfr(5.05, 256)
b <- mpfr(5, 256)
diff(c(b, a))
1 'mpfr' number of precision  256   bits
[1] 0.04999999999999982236431605997495353221893310546875

mpfr(a-b, 32)
1 'mpfr' number of precision  32   bits
[1] 0.05

a <- mpfr(4, 256)
b <- mpfr(2, 256)
diff(c(b, a))
1 'mpfr' number of precision  256   bits
[1] 2


Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2019-07-20 07:08
(595 d 16:57 ago)

@ Helmut
Posting: # 20411
Views: 6,696
 

 As designed ☺

Hi Helmut,

» Haha, I know this game and therefore, always round before the comparison. ;-)

or never bring yourself into a situation where you need to compare floats.

I've taken the worst way out and written functions that check if x is simar to y plus/minus a wee faction like 1e-8 or something. It is ugly and clumsy, it works, and it feel everytime like I am suffering defeat.


"John Brown's got a little Endian" :-):-):-)

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-07-20 10:53
(595 d 13:11 ago)

@ ElMaestro
Posting: # 20412
Views: 6,688
 

 To round or not to round…

Hi ElMaestro,

» » Haha, I know this game and therefore, always round before the comparison. ;-)
»
» or never bring yourself into a situation where you need to compare floats.

Did I tell to how many significant digits I round? :-D

» I've taken the worst way out and written functions that check if x is simar to y plus/minus a wee faction like 1e-8 or something.

I don’t know which kind of measurements you are comparing. Sumfink ultra-precise (according to the International Bureau of Weights and Measures: t ±5×10–16, l ±2.1×10–11)?

Zizou (I guess) and I were talking about concentrations. AP according to the GLs 20% at the LLOQ and 15% above. Hence, everything reported substantially beyond that is not relevant (shall I call it noise?). What substantially means, depends on the field. Some people go crazy with 6σ, physicists are fine with just 3σ.

x       <- pi
p       <- 15
s       <- 8
mult    <- c(3, 6)
sigma   <- x * p / 100
sigma.m <- sigma * mult
lo      <- x - sigma.m
hi      <- x + sigma.m
df      <- data.frame(x = rep(x, 2), prec = p,
                      sigma = sigma, mult = mult,
                      sigma.m = sigma.m,
                      lo = lo,  hi = hi)
df      <- signif(df, s)
names(df)[2] <- "prec. (%)"
print(df, row.names = FALSE)

       x prec. (%)     sigma mult  sigma.m        lo       hi
3.141593        15 0.4712389    3 1.413717 1.7278760 4.555309
3.141593        15 0.4712389    6 2.827433 0.3141593 5.969026

Do you get the idea? The true value might be somewhere between our self-imposed limits. Therefore, I don’t give a shit whether I get 3.141 or 3.141 592 653 589 793 in an electronic file. However, I insist in a CRC-checksum2 to verify the data-transfer. If I deal with bloody Excel, I round to one decimal beyond what is given in the analytical report3 being aware that it is far beyond the analytical AP. If the data-transfer of analytical results to stats was done electronically in “full numeric precision” (haha), I want to see validation for it.4

» It is ugly and clumsy, it works, and it feel everytime like I am suffering defeat.

Well, you are the C-man here. What about printf("%.yg\n", x); where y is the desired number of significant digits? With [image]’s signif():

options("digits" = 15)
x    <- 12345678.999999
y    <-        0.12345678999999
prec <- 8
fmt1 <- "%33.17f"
fmt2 <- paste0("%.", prec, "g")
cat(x, "\n",
    y, "\n",
    sprintf(fmt1, x), "\u2190 fake news\n",
    sprintf(fmt1, y), "\u2190 fake news\n",
    sprintf(fmt1, signif(x, prec)), "\u2190", prec, "significant digits\n",
    sprintf(fmt1, signif(y, prec)), "\u2190", prec, "significant digits\n",
    sprintf(fmt2, x), "\u2190", "directly with", paste0("'", fmt2, "'\n"),
    sprintf(fmt2, y), "\u2190", "directly with", paste0("'", fmt2, "'\n"))

12345678.999999
 0.12345678999999
        12345678.99999
899975955486 ← fake news
               0.12345678999999000 ← fake news
        12345679.00000000000000000 ← 8 significant digits
               0.12345679000000000 ← 8 significant digits
12345679 ← directly with '%.8g'
0.12345679 ← directly with '%.8g'


If you are interested whether rubbish in ≈ rubbish out, ask for a checksum and verify it. Probably better than diving into the murky waters of (likely irrelevant) rounding.


  1. I would be fine with just 3.1 for π. The RE is –1.32%, again much below what is achievable in bioanalytics.
  2. SHA-256 or higher preferred (collisions reported for SHA-1, i.e., different input give the same hash). MD5 is better than nothing.
  3. That’s the only GxP-compliant (dated/signed, released by QUA) document, right? Did you ever see result-tables with 15 significant digits?
  4. Here it gets tricky. These data are different to what is given in the analytical report. Now what? At least I expect a statement about this discrepancy in the protocols (analytical and/or statistical). Regularly I see something like “calculations were performed in full numeric precision”. How could one ever hope to verify that having only the analytical report with rounded results?

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2019-07-20 17:59
(595 d 06:05 ago)

@ Helmut
Posting: # 20415
Views: 6,656
 

 To round or not to round…

Hi Hötzi,

» I don’t know which kind of measurements you are comparing.

It stinks but there really is no good solution when you have to compare.
(294.1/1000) == 0.2941
[1] FALSE
(2941/10000) == 0.2941
[1] TRUE


One can read ISO standards and manuals about internal float representation (I tried andfell asleep), but at the end of the day if you need to compare you need to compare and hope for the best. Some numbers just cannot be represented accurately in binary.

If you accept an xyz fraction of error on the comparison then the solution is straightforward, like

Match = function (x,y, relative.tol)
{
 if (y<((1+relative.tol)*x))
  if (y>((1-relative.tol)*x))
    return(T)
 return(F)
}

Match(6, 6.001, 0.01)


Ugly as hell :vomit:, and will not work well if you get into extreme binary representations.

Pass or fail!
ElMaestro
mittyri
★★  

Russia,
2019-07-20 20:43
(595 d 03:21 ago)

@ ElMaestro
Posting: # 20416
Views: 6,650
 

 floating-point math is always more complex than you think it is

Hi ElMaestro,

» Match(6, 6.001, 0.01)
»
» Ugly as hell :vomit:, and will not work well if you get into extreme binary representations.

take a look at
if (Abs(x - y) <= absTol * Max(1.0f, Abs(x), Abs(y)))
the description is here

One may consider ULP, pros and cons described here

Kind regards,
Mittyri
Ohlbe
★★★

France,
2020-12-10 17:43
(86 d 06:21 ago)

@ ElMaestro
Posting: # 22125
Views: 1,825
 

 Spreadsheet failures, any recent examples?

Dear ElMaestro,

» does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

A colleague recently pointed to a strange behaviour in Excel (I have 2013, he has 2016, both give the same result here.

Type =0.5-0.4-0.1: you'll get the expected result, 0.

Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation ?

Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0 ? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

Regards
Ohlbe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-12-10 18:12
(86 d 05:52 ago)

@ Ohlbe
Posting: # 22126
Views: 1,815
 

 Floating point arithmetic, again

Dear Ohlbe,

» A colleague recently pointed to a strange behaviour in Excel (I have 2013, he has 2016, both give the same result here.

And I have 2000 (don’t laugh).

» Type =0.5-0.4-0.1: you'll get the expected result, 0.

Yep.

» Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation ?

In principle, yes. Unless we could inspect the source code by M$, difficult to tell what happens with the brackets.

» Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0 ? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

OK, write in A1 =0.5-0.4-0.1 and in B1 =A1=0
  1. What do you expect?
  2. What do you get?
[image] Floating point arithmetic, again (see also this post above).
Try in A1 =16-8-4-2-2, in B1 =(16-8-4-2-2), and in C1 =A1=B1. I bet it “works” as expected, since these numbers can be converted to binary digits without error.

What about [image]?

options("digits" = 16)
a <- 0.5-0.4-0.1
b <- (0.5-0.4-0.1)
a; b
[1] -2.775557561562891e-17
[1] -2.775557561562891e-17

identical(a, b)
[1] TRUE
c <- 16-8-4-2-2
d <- (16-8-4-2-2)
c; d
[1] 0
[1] 0

identical(c, d)
[1] TRUE


Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2020-12-10 18:40
(86 d 05:24 ago)

@ Helmut
Posting: # 22127
Views: 1,811
 

 Floating point arithmetic, again

Hi Ohlbe,

» Type =0.5-0.4-0.1: you'll get the expected result, 0.

» Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation ?

Yes it is (or may be) to you and to me, but that isn't how an electronic brain works.

» Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0 ? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

In the world of 1's and 0's you cannot represent all decimal numbers that are easily displayed in our common number format using digits of 0 to 9 ("decimal"). Your little equations exemplifies it wonderfully; it is not per se an Excel issue.

Here's R:

> 0==(0.5-0.4-0.1)
[1] FALSE

You can totally represent .5 (a half) in binary: It is actually .1 because it is 2 raised to the power of -1.
.11 in binary would be 2 raised to the power of -1 plus 2 raised to the power of -2. And so on.

But things are tricky for a figure like 0.1 (one tenth) - in binary you'd probably represent it as .00011001100110011001100110011001100110011001100110011... where the 0011 blocks repeat indefinitely.
In computers are data types have a maximum storage space, like 4 bytes or 8 bytes. So somewhere along the repeating 0011 blocks the computer will truncate the figure so it fits in the memory. And that is the source of your rounding issue.

Computer scientists know never to compare fractional numbers uncritically.
It stinks. The problem is not Excel, but that we have invested computers that use binary when at the same time we like to think of numbers represented with the usual 10 digits ("decimal"). :-)

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2020-12-10 19:18
(86 d 04:46 ago)

@ ElMaestro
Posting: # 22129
Views: 1,810
 

 Floating point arithmetic, again

Hi ElMaestro,

» But things are tricky for a figure like 0.1 (one tenth) - in binary you'd probably represent it as .00011001100110011001100110011001100110011001100110011... where the 0011 blocks repeat indefinitely.
» In computers are data types have a maximum storage space, like 4 bytes or 8 bytes. So somewhere along the repeating 0011 blocks the computer will truncate the figure so it fits in the memory. And that is the source of your rounding issue.

Mmmm. But then if the problem comes from the nasty 0.1, how comes that if I type =(0.8-0.4-0.2-0.1-0.1) I get a nice 0, without any funky E-17 ? And then if I type =0=(0.8-0.4-0.2-0.1-0.1), or =0=0.8-0.4-0.2-0.1-0.1 I get a nice TRUE ?

Regards
Ohlbe
ElMaestro
★★★

Denmark,
2020-12-10 20:38
(86 d 03:26 ago)

@ Ohlbe
Posting: # 22130
Views: 1,801
 

 Floating point arithmetic, again

Hi Ohlbe,


» Mmmm. But then if the problem comes from the nasty 0.1, how comes that if I type =(0.8-0.4-0.2-0.1-0.1) I get a nice 0, without any funky E-17 ? And then if I type =0=(0.8-0.4-0.2-0.1-0.1), or =0=0.8-0.4-0.2-0.1-0.1 I get a nice TRUE ?

Impossible to tell if this is an effect from Excel or from some degree of streamlining within the CPU.
It is fairly trivial (I think) at the CPU level as well as in Excel to check if you are handing objects of the type (x-x) and to substitute those with a zero where they occur.

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2020-12-10 20:46
(86 d 03:18 ago)

@ ElMaestro
Posting: # 22131
Views: 1,798
 

 Floating point arithmetic, again

Hi ElMaestro,

» Impossible to tell if this is an effect from Excel or from some degree of streamlining within the CPU.

Same behaviour in [image]:

0==(0.8-0.4-0.2-0.1-0.1)
[1] TRUE


Regards
Ohlbe
ElMaestro
★★★

Denmark,
2020-12-10 21:05
(86 d 03:00 ago)

@ Ohlbe
Posting: # 22132
Views: 1,797
 

 Floating point arithmetic, again

Hi Ohlbe,

> 0==(0.8-0.4-0.2-2*0.1)
[1] TRUE
> 0==(.1+0.8-0.4-0.2-3*0.1)
[1] FALSE


We (as in us humans) should just switch to binary and we'd never have these discussions. We can still keep the metric system but to keep things easy we may wish to define a millimeter as 1/1024 of a meter, if we wish to impress someone by our ability to do division and multiplication in our heads :-D


In a nutshell, try:
a=.11111111111111111111111111111111111111111111111111111111
b=.111111111111111111111111111111111111111111111111111111111
(a-b)==0



Edit: two posts merged [Ohlbe]

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2020-12-10 22:35
(86 d 01:29 ago)

@ ElMaestro
Posting: # 22135
Views: 1,784
 

 Depressed

» In a nutshell, try:
» a=.11111111111111111111111111111111111111111111111111111111
» b=.111111111111111111111111111111111111111111111111111111111
» (a-b)==0

ElMaestro, why do you do this to me ? You know I have a natural distrust for many things and I see evil everywhere. How much did my shrink pay you, so that I would continue visiting him for another 10 years ? :crying::crying::crying:


Update: just called my shrink, as I couldn't fall asleep. He reminded me that I had already seen this kind of stuff on this forum before, and had survived. He asked me whether I could think of a single situation where I would require this kind of precision in any calculation, else this could in some way affect my life, whether personal or professional. I had to admit that no. So we agreed he could go back to sleep and would send his bill in the morning.

He's a good shrink. A trifle expensive, but good.

Regards
Ohlbe
ElMaestro
★★★

Denmark,
2020-12-11 06:59
(85 d 17:05 ago)

@ Ohlbe
Posting: # 22139
Views: 1,758
 

 Depressed

Hi Ohlbe,

» He's a good shrink. A trifle expensive, but good.

Call me next time. I will say "Just forget about it, my friend" and it will only cost you, shall we say, 2700 euros per consultation?
Cash payment preferred, but only if the latter been adequately laundered and sanitized (in that order).

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2020-12-10 19:13
(86 d 04:51 ago)

@ Helmut
Posting: # 22128
Views: 1,801
 

 Floating point arithmetic, again

Dear Helmut,

» OK, write in A1 =0.5-0.4-0.1 and in B1 =A1=0

»    1. What do you expect?

I expect A1 to read 0. Regarding B1: I would naturally expect it to return TRUE. But on the other hand, typing =0=0.5-0.4-0.1 returns FALSE, and I would not necessarily expect a two-step calculation to yield a result that differs from a one-step calculation. By the way I checked: =0=5-4-1 returns TRUE, so it is not that Excel just compares 0 to 0.5 and ignores the subsequent signs and figures.

»    2. What do you get?

A1 returns 0, as expected. Switching to scientific notation gives 0,00E+00. So it doesn't look like we have some hidden decimals there that don't show because of the default display.

B1 returns TRUE. Indeed the result I would normally expect. But this means that typing the comparison and the calculation in one cell, or the calculation in one cell and the comparison in another, gives different results.

» Try in A1 =16-8-4-2-2, in B1 =(16-8-4-2-2), and in C1 =A1=B1. I bet it “works” as expected, since these numbers can be converted to binary digits without error.

Yes. But it works also with A1 =5-4-1, in B1 =(5-4-1), and in C1 =A1=B1. So it works with 5, which is not a power of 2 ?

Regards
Ohlbe
Ohlbe
★★★

France,
2020-12-10 21:11
(86 d 02:53 ago)

@ Ohlbe
Posting: # 22133
Views: 1,800
 

 From bad to worse

Hi fellows,

Sorry guys, but there are some things going back behind the scene, which I really fail to understand.

If I get -2,77556E-17 when I type =(0.5-0.4-0.1), how comes that I get 0 when I type =(-0.1-0.4+0.5) ? The same figures, just in a different order ?

If I get FALSE when I type =0=0.5-0.4-0.1, how comes that I get TRUE when I type =0=-0.4-0.1+0.5 ?

And to be fair to Excel, the same happens in [image]:

0==(0.5-0.4-0.1)
[1] FALSE
0==(-0.4-0.1+0.5)
[1] TRUE


Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0 ?

Regards
Ohlbe
mittyri
★★  

Russia,
2020-12-10 23:22
(86 d 00:43 ago)

@ Ohlbe
Posting: # 22136
Views: 1,772
 

 From bad to worse

Dear Ohlbe,

» If I get -2,77556E-17 when I type =(0.5-0.4-0.1), how comes that I get 0 when I type =(-0.1-0.4+0.5) ? The same figures, just in a different order ?

The order matters when you go to floating point arithmetic;-)


» If I get FALSE when I type =0=0.5-0.4-0.1, how comes that I get TRUE when I type =0=-0.4-0.1+0.5 ?

Rightyright. Why did it happen? The root cause is described here
Excel does do some additional operations under the hood.

» And to be fair to Excel, the same happens in [image]:
»
»

0==(0.5-0.4-0.1)
» [1] FALSE
» 0==(-0.4-0.1+0.5)
» [1] TRUE


» Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0 ?

Yes, take a look at this MS article
Both Excel and R are following IEEE 754, not surprising that the conditions are giving the same results. The problem is only in values representation to the user.

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-12-10 23:36
(86 d 00:28 ago)

@ Ohlbe
Posting: # 22137
Views: 1,775
 

 All is good

Hi Ohlbe,

» … the same happens in [image]:

So you installed this goody at last? :-D

»

0==(0.5-0.4-0.1)
» [1] FALSE
» 0==(-0.4-0.1+0.5)
» [1] TRUE


» Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0 ?

Would have to dive into the [image] lexer of the source of base-[image]. Likely not clever enough anyway. If you want to explore the matter, type ?Syntax, ?Arithmetic, ?Comparison, and their relatives. See also the FAQ 7.31. However, the `==` operator should not be used to compare two objects for identity. Otherwise we fall into the binary trap:

x1 <- (0.5-0.4-0.1)
x2 <- (-0.4-0.1+0.5)
x1; x2
[1] -2.775558e-17
[1] 0

x1 == x2
[1] FALSE
identical(x1, x2)
[1] FALSE
all.equal(x1, x2)
[1] TRUE

Only the last function compares to the numeric precision of the machine* and this is what we want and the best we can hope for…


  • On my machine with 64bit [image] …
    sqrt(.Machine$double.eps)
    [1] 1.490116e-08

    … which is orders of magnitude larger than abs(-2.775558e-17) – which is just “noise”. Hence, the test passes.
    x1 <- (0.5-0.4-0.1)
    x2 <- (-0.4-0.1+0.5)
    sqrt(.Machine$double.eps) > abs(x1 - x2)
    [1] TRUE

    That’s a crude example. The source is more tricky. Try getAnywhere(all.equal.numeric) to see the 77 lines of source code fo comparing numbers.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
PharmCat
★    

Russia,
2020-12-18 19:53
(78 d 04:12 ago)

(edited by PharmCat on 2020-12-18 21:02)
@ Helmut
Posting: # 22150
Views: 1,427
 

 Float is float

Hi all,

This is very trivial story... let's take

a = 0.1

next float is:

julia> nextfloat(0.1)
0.10000000000000002


so

julia> nextfloat(0.1) - 0.1
1.3877787807814457e-17


float numbers is discrete... you can't use 0.10000000000000001 ...

you can use rational numbers:

julia> 5//10 - 4//10 - 1//10 == -4//10 - 1//10 + 5//10
true


or some packages for working with big decimals

julia> using Decimals
julia> decimal(0.5)-decimal(0.4)-decimal(0.1) == decimal(-0.4)-decimal(0.1)+decimal(0.5)
true


I think in R it can be done... and never use Excel for cacl...

edit: or you can do float(big(1//10)) if you just need more precision...

edit2:

» which is just “noise”

you should be very careful with eps() because this is not an indicator of accuracy

you can use something like:

a = 1e-70
b = 2e-70
b - a


end it will work fine, problems begans whe you do somesing like: 2.0 + 1e-70 , but you can still use 2.0 * 1e-70 without problems ...

make sure not to sum numbers with difference in eps() times ...

for comparation

all.equal(target, current, tolerance = sqrt(.Machine$double.eps))

can be used, but for small numbers you should adjust tolerance
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-12-20 22:27
(76 d 01:37 ago)

@ PharmCat
Posting: # 22151
Views: 1,300
 

 Float is float!

Hi PharmCat,

» I think in R it can be done...

At least consistently giving not zero. ;-)

library(Rmpfr)
prec <- 64
x <- mpfr(0.5, prec) - mpfr(0.4, prec) - mpfr(0.1, prec)
x
1 'mpfr' number of precision  64   bits
[1] -2.77555756156289135106e-17

y <- mpfr(-0.4, prec) - mpfr(0.1, prec) + mpfr(0.5, prec)
y
1 'mpfr' number of precision  64   bits
[1] -2.77555756156289135106e-17


Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2020-12-21 12:49
(75 d 11:15 ago)

@ Helmut
Posting: # 22152
Views: 1,266
 

 rational solution in R

Hi Helmut,

» At least consistently giving not zero. ;-)
»

library(Rmpfr)
» prec <- 64
» x <- mpfr(0.5, prec) - mpfr(0.4, prec) - mpfr(0.1, prec)
» x
» 1 'mpfr' number of precision  64   bits
» [1] -2.77555756156289135106e-17
» y <- mpfr(-0.4, prec) - mpfr(0.1, prec) + mpfr(0.5, prec)
» y
» 1 'mpfr' number of precision  64   bits
» [1] -2.77555756156289135106e-17



I think the right solution is
install.packages("remotes")
remotes::install_github("bertcarnell/rational")
library(rational)
(rational(5L, 10L) - rational(4L, 10L) - rational(1L, 10L)) == 0
[1] TRUE


Interestingly but I did not find good solution using CRAN

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2021-01-14 11:53
(51 d 12:11 ago)

@ mittyri
Posting: # 22180
Views: 890
 

 related stuff

Hi mittyri,

just discovered this goody.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
SDavis
★★  
Homepage
UK,
2021-02-09 11:02
(25 d 13:02 ago)

@ Helmut
Posting: # 22206
Views: 479
 

 related stuff

https://www.bbc.co.uk/news/technology-54423988 UK GOv and its repeated desire to use expensive and incompetent consultants....

Simon
Senior Scientific Trainer, Certara™
[link=https://www.youtube.com/watch?v=xX-yCO5Rzag[/link]
https://www.certarauniversity.com/dashboard
https://support.certara.com/forums/
ElMaestro
★★★

Denmark,
2021-02-09 18:55
(25 d 05:09 ago)

@ SDavis
Posting: # 22207
Views: 442
 

 related stuff

Hihi,

thanks for that post SD,
I agree this is perhaps not a spreadsheet failure but rather a consultant failure.

Generally, I tell my clients they can have any two out of the golden three: Cheap, fast and good.
In particular, cheap and fast certainly won't be good.

(side note: when they use me as a consultant it won't be good regardless, but that's just an unrelated fact of life :-D)

Pass or fail!
ElMaestro
Activity
 Admin contact
21,371 posts in 4,463 threads, 1,495 registered users;
online 8 (0 registered, 8 guests [including 5 identified bots]).
Forum time: Sunday 00:05 UTC (Europe/Vienna)

When people learn no tools of judgment
and merely follow their hopes,
the seeds of political manipulation are sown.    Stephen Jay Gould

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5