Bioequivalence and Bioavailability Forum

ElMaestro
★★★

Denmark,
2019-07-18 15:43
(2104 d 00:33 ago)

Posting: # 20388
Views: 26,222

Spreadsheet failures, any recent examples? [Software]

Hi all,

does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

I am not talking about for example the cases of standard deviations or variances that went funky when the application was fed extreme data ranges, or similar waterboarding of software, which was debated 10 years ago and which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.

Realistic.
Data.
Only.
Please.

I will leave it to the reader to quantitatively define what "realistic" and "recent" actually mean and I reserve the right to challenge the definitions, obviously :-D

Of course I ask because I never thought spreadsheets were or are as bad as their reputation. The users, on the other hand, are every bit as bad as their reputation. This certainly includes myself. Most likely I am the worst. But that will be the topic of another thread :-)

—
Pass or fail!
ElMaestro

Ohlbe
★★★

France,
2019-07-18 16:32
(2103 d 23:44 ago)

@ ElMaestro
Posting: # 20389
Views: 24,119

Spreadsheet failures, any recent examples?

Post reply

Dear ElMaestro,

❝ does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

Nope. I have examples of users using the wrong functions, or unlocked spreadsheets getting modified at some point, but that's about all. The problem is most often somewhere between the chair and the keyboard: wetware rather than software.

Well, there are some "features" of Excel you need to be aware of when using it for stats, of course. Already mentioned on the forum.

❝ [...] which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.

Maybe their coding skills are not any better than mine*, and they consider that they will always trust whatever they do with a spreadsheet more than anything they might obtain with R ? :-D

* sharing with the forum some experience already discussed by email with ElMaestro a couple of years ago: I installed R on my home computer. With the help of some "R for R-seholes" tutorials, it took me something like 4 hours to manage to open a simple set of data and make a simple plot out of it . I spent a few more hours trying to have the colour of the plot change based on simple conditions (red if under a threshold value, green above it). I was even waking up at night with ideas on new ways to test. I never succeeded. Never used R again. I would never trust any of my results anyway. :lamer:

—
Regards
Ohlbe

Helmut
★★★

Vienna, Austria,
2019-07-18 18:12
(2103 d 22:04 ago)

@ Ohlbe
Posting: # 20390
Views: 24,295

Spreadsheet addiction

Post reply

Dear Ohlbe

❝ ❝ [...] which led regulators to conclude they were the only ones who could be trusted to make proper use of such pernicious creations.

❝

❝ Maybe their coding skills are not any better than mine*, and they consider that they will always trust whatever they do with a spreadsheet more than anything they might obtain with R ? :-D

There is a series of papers about indirect adjusted comparisons in BE published by regulators. All done in Excel.

❝ * […] I spent a few more hours trying to have the colour of the plot change based on simple conditions (red if under a threshold value, green above it). I was even waking up at night with ideas on new ways to test. I never succeeded.

I believe it. [image]

is a nasty beast like SAS (© Detlew). Waking up in the middle of the night or – worse – not being able to fall asleep at all is a common side-effect. Since you will never use [image]

again, skip this code:

op <- par(no.readonly = TRUE) # Safe original graphics parameters par(pty = "s") # I want a square plotting region x <- runif(n=50, min=1, max=20) # Sample from the uniform distribution a <- 0.5 # Intercept b <- 2 # Slope y <- a + b * x + rnorm(n=length(x), mean=0, sd=2) # Response + random error th <- 10 # Threshold plot(x, y, type = "n", las = 1) # Important: type="n", will add points later grid() # Nice to have one abline(h = th, col = "red") # Line for threshold abline(lm(y ~ x), lwd = 2, col = "blue") # Linear regression points(x[y < th], y[y < th], pch = 21, cex = 1.5, col = "red", bg = "#FFD700AA") # Below threshold points(x[y >= th], y[y >= th], pch = 21, cex = 1.5, col = "darkgreen", bg = "#32CD32AA") # At least threshold par(op) # Restore original graphics parameters

More fancy stuff wanted?

library(shape) op <- par(no.readonly = TRUE) par(pty = "s") x <- runif(n = 50, min = 1, max = 20) a <- 0.5 b <- 2 y <- a + b * x + rnorm(n=length(x), mean=0, sd=2) th <- 10 xlim <- c(0, 21) ylim <- c(0, 42) plot(x, y, type = "n", las = 1, xlim = xlim, ylim = ylim) below <- y[y < th] above <- y[y >= th] clr.below <- colorRampPalette(c("#880000", "red"))(length(below)) clr.below <- paste0(clr.below, "80") clr.above <- colorRampPalette(c("#004000", "#00BB00"))(length(above)) clr.above <- paste0(clr.above, "80") df.below <- data.frame(x=x[y < th], y = below, orig = 1:length(below)) df.below <- df.below[order(df.below$y, decreasing = TRUE), ] df.below$rank <- length(below):1 df.below$clr <- clr.below[df.below$rank] df.above <- data.frame(x=x[y >= th], y=above, orig=1:length(above)) df.above <- df.above[order(df.above$y, decreasing = TRUE), ] df.above$rank <- length(above):1 df.above$clr <- clr.above[df.above$rank] filledrectangle(wx=30, wy = th, mid = c(10, th/2), col = colorRampPalette(c("#FFCCCC", "#FFEEEE"))(256)) grid() abline(lm(y ~ x), lwd = 2, col = "blue") abline(h = th, col = "red") box() for (j in 1:nrow(df.below)) { points(x = df.below[df.below$rank == j, ]$x, y = df.below[df.below$rank == j, ]$y, pch = 21, cex = 1.5, col = df.below[df.below$rank == j, ]$clr, bg = df.below[df.below$rank == j, ]$clr) } for (j in 1:nrow(df.above)) { points(x = df.above[df.above$rank == j, ]$x, y = df.above[df.above$rank == j, ]$y, pch = 21, cex = 1.5, col = df.above[df.above$rank == j, ]$clr, bg = df.above[df.above$rank == j, ]$clr) } par(op)

[image]

Semitransparent colors in Excel? I don’t think that’s possible. ;-)

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Ohlbe ★★★ France, 2019-07-18 19:30 (2103 d 20:46 ago) @ Helmut Posting: # 20391 Views: 23,962	Nasty beast Post reply
	Dear Helmut, ❝ Since you will never use again, skip this code Thx... I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints. — Regards Ohlbe

Helmut
★★★

Vienna, Austria,
2019-07-18 22:26
(2103 d 17:49 ago)

@ Ohlbe
Posting: # 20396
Views: 24,005

Nasty beast

Post reply

Dear Ohlbe,

❝ I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints.

Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors – as you experienced – and can be slooow. If you have a vector of data (say y) there a various ways to access its values. Example with 25 random integers in the range 1…40:

n <- 25 y <- round(runif(n = n, min = 1, max = 40)) th <- 10 # threshold n <- length(y) # we know but useful later head(y, 10) # the first 10 y[1:10] # same tail(y, 10) # the last 10 y[(n - 9):n] # same but tricky blq <- which(y < th) # the ones are below the threshold length(blq) # how many? y[blq] # show them y[y < th] # same

In such a simple case which() is an overkill, though one can include many conditions which [sic] makes the code more easy to understand. If you have two vectors (say x, y) like in my last post, you can select values of x depending on which(y = condition).
Let’s check:

n <- 50 x <- runif(n = n, min = 1, max = 20) a <- 0.5 b <- 2 y <- a + b * x + rnorm(n = length(x), mean = 0, sd = 2) th <- 10 fun1 <- function(x, y, th) { # clumsy x.th <- numeric() y.th <- numeric() bql <- 0L for (k in seq_along(x)) { if (y[k] < th) { bql <- bql + 1 x.th[bql] <- x[k] y.th[bql] <- y[k] } } z <- data.frame(x.th, y.th) return(invisible(z)) } fun2 <- function(x, y, th) { # better blq <- which(y < th) x.th <- x[blq] y.th <- y[blq] z <- data.frame(x.th, y.th) return(invisible(z)) } fun3 <- function(x, y, th) { # a little bit confusing for beginners x.th <- x[y < th] y.th <- y[y < th] z <- data.frame(x.th, y.th) return(invisible(z)) } res1 <- fun1(x, y, th) res2 <- fun2(x, y, th) res3 <- fun3(x, y, th) identical(res1, res2); identical(res1, res3) # same? [1] TRUE [1] TRUE

Bingo! Which one is easier to read?
What about speed?

library(microbenchmark) res <- microbenchmark(fun1(x, y, th), fun2(x, y, th), fun3(x, y, th), times = 1000L) print(res, signif = 4) Unit: microseconds expr min lq mean median uq max neval cld fun1(x, y, th) 173.9 181.1 196.1 186.3 189.9 1530 1000 b fun2(x, y, th) 163.0 170.3 183.6 175.4 179.0 3311 1000 a fun3(x, y, th) 163.0 169.0 185.5 174.5 178.4 3310 1000 ab

Practically the same, since vectors are short. If we play this game with longer vectors fun2() shows its strengths.

[image]

If one has more conditions any external construct will suck. We have 100 random integers (1…50) and want to get the even ones between 20 and 30 in increasing order.

x <- round(runif(n = 100, min = 1, max = 50)) x [1] 37 29 8 34 6 13 47 22 30 44 6 14 33 18 12 37 32 [18] 10 6 37 27 2 43 40 7 5 47 4 32 17 7 50 39 36 [35] 38 48 34 5 21 43 34 50 29 20 33 6 45 32 28 8 1 [52] 26 29 19 42 9 38 31 25 4 1 23 37 31 2 26 29 24 [69] 40 43 17 16 41 17 5 17 36 16 7 5 36 30 5 8 19 [86] 40 42 30 33 21 13 25 21 33 16 7 33 36 19 37

One-liner:

sort(x[which(x >= 20 & x <= 30 & x %%2 == 0)]) [1] 20 22 24 26 26 28 30 30 30

Or even shorter:

sort(x[x >= 20 & x <= 30 & x %%2 == 0]) [1] 20 22 24 26 26 28 30 30 30

Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Ohlbe
★★★

France,
2019-07-19 13:32
(2103 d 02:44 ago)

@ Helmut
Posting: # 20397
Views: 23,832

Nasty beast

Post reply

Dear Helmut,

❝ Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors

❝ Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }

I think that's really what pissed me off. I thought something like if... else... would be very logical and easy. Maybe because these are real words from a language I can understand. But it looks like R and I are following different kinds of logics.

When you learn a new language and start speaking it with somebody, that person will usually show some goodwill, ignore grammatical mistakes and try and understand what you mean. R made no efforts of any kind, even though what I was trying to achieve was pretty obvious :-D

❝ If you have a vector of data (say y)

Yeah, ElMaestro started using this kind of vocabulary too when we were exchanging email. That got me even more confused. All I learnt at school about vectors is the geometric side of the concept - what Wikipedia apparently calls Euclidian vector. I had some problems understanding what an arrow between two points had to do with what I was trying to achieve.
Sorry ElMaestro, we're going quite far from your original question... :offtopic!:

—
Regards
Ohlbe

ElMaestro
★★★

Denmark,
2019-07-19 14:32
(2103 d 01:44 ago)

@ Ohlbe
Posting: # 20399
Views: 24,345

Nasty beast

Post reply

Hi Ohlbe,

❝ Sorry ElMaestro, we're going quite far from your original question... :offtopic!:

That's OK.
The occasional drift on this forum always pleases me.

As for the logic, it is absolutely !7==!3 what you are saying, programming involves that kind of logic which is entirely bound to ones and zeros and that can be hard to grasp when the reference is our everyday language.

If it is any comfort, I can confess I usually can't read Hötzi's code right away. I usually use a lot of time deciphering it, or giving up. This is because syntactically it is condensed. I tend to use much longer lines and bigger functions to achieve the same goals. I can't maintain code otherwise, but this is just me.

Vector: Something that carries something, according to Birdsong's Internally Hosted Vocabulary (the platform is walnut-sized).
The object of this carriage may be information, like "two steps in this direction and then five steps in the perpendicular direction". Or "Apple and thereafter Syphilis".
The origin of the word is from the Illiad by Homer, song 13, in which Thetis says: "Vector bears it in triumph upon his own shoulders.", possibly expressed as such because he was in love?! Or at least this is my belief since it was said in relation to some amour.

—
Pass or fail!
ElMaestro

Ohlbe ★★★ France, 2019-07-19 15:38 (2103 d 00:38 ago) @ ElMaestro Posting: # 20402 Views: 23,804	Decidedly off topic Post reply
	Hi ElMaestro, ❝ The origin of the word is from the Illiad by Homer, song 13, in which Thetis says: "Vector bears it in triumph upon his own shoulders.", possibly expressed as such because he was in love?! Or at least this is my belief since it was said in relation to some amour. But if I understand the text well, this is only applicable to Trojan virus coding... — Regards Ohlbe

Helmut
★★★

Vienna, Austria,
2019-07-19 15:00
(2103 d 01:16 ago)

@ Ohlbe
Posting: # 20401
Views: 23,804

OT: R limbo 101

Post reply

Dear Ohlbe,

❝ I thought something like if... else... would be very logical and easy. Maybe because these are real words from a language I can understand.

Perfectly understandable. Of course, these constructs exist in R and we use them all the time. The point is only that direct access of values (with a condition) is simply more efficient.
If you don’t have to deal with large data sets, both are fine. Since I deal a lot with simulations (1 mio BE studies with a sometimes high number of subjects / periods / sequences, …) I learned the hard way to avoid the ‘simple’ constructs if ever possible (given, sometimes I know that it could be done but was too stupid).
Think about my previous example. For 1 mio calls the direct access is 15times faster than the loop and if() within. It’s not unusual to have nested calls. For two it would by ~230times slower and for three already ~3,500times. Would test your patience.

❝ But it looks like R and I are following different kinds of logics.

Not necessarily so. See above.

❝ When you learn a new language and start speaking it with somebody, that person will usually show some goodwill, ignore grammatical mistakes and try and understand what you mean. R made no efforts of any kind, even though what I was trying to achieve was pretty obvious :-D

I agree. See the subject line. The man-pages of [image]

’s functions were written by statisticians, regularly forgetting non-expert users. Sometimes the response you get is bizarre. Want to know how for() works?

?for + ⏎
+
Fuck, get me out of here! esc
help(for) Error: unexpected ')' in "help(for)"
What the heck? Shall I really try it without the closing bracket? Strange. OK, OK.
help(for +
Aha, the closing bracket is missing!
) ⏎
Error: unexpected ')' in: " )"
You can’t be serious! Google, google, reading the R-Inferno Section 8.2.30. Oh my!
?`for` help(`for`)
Both work as expected. I don’t know what’s the logic behind. Beyond me.

❝ ❝ If you have a vector of data (say y)

❝

❝ Yeah, ElMaestro started using this kind of vocabulary too when we were exchanging email. […] All I learnt at school about vectors is the geometric side of the concept - what Wikipedia apparently calls Euclidian vector.

Correct.

❝ I had some problems understanding what an arrow between two points had to do with what I was trying to achieve.

Absolutely nothing, of course. In [image]

a one-dimensional array is meant. The indexing (i.e., where it starts) differs between languages.

0
C, C++, BASIC-dialects, Pascal (default; can be declared as 1), PHP, Perl, Java, JavaScript, Lisp, Python, Ruby, Scheme, …
1
S, R, Julia, FORTRAN, ALGOL, AWK, COBOL, Matlab, Mathematica, Wolfram Language, XQuery, …

In many languages there are three types of data structures: scalar (a single number, character, string, boolean), one- (R: vector) and multidimensional array (R: data.frame and matrix). In R there are no scalars, only vectors containing a single element instead.

x1 <- 1 x2 <- 1L x3 <- TRUE x4 <- NA x5 <- "a" comp <- matrix(data = c(c(x1, x2, x3, x4, x5), c(is.vector(x1), is.vector(x2), is.vector(x3), is.vector(x4), is.vector(x5)), c(length(x1), length(x2), length(x3), length(x4), length(x5)), c(typeof(x1), typeof(x2), typeof(x3), typeof(x4),typeof(x5))), nrow = 5, ncol = 4, dimnames = list(paste0("x", 1:5), c("value", "vector?", "length", "type"))) print(as.data.frame(comp)) value vector? length type x1 1 TRUE 1 double x2 1 TRUE 1 integer x3 TRUE TRUE 1 logical x4 <NA> TRUE 1 logical x5 a TRUE 1 character

What makes [image]

so powerful is yet another data type, namely the list. A list can contain any of the other types and a mixture of them (even other lists…).

char.vect <- letters[1:2] int.vect <- as.integer(1:3) mydf <- data.frame(x = 1:4, y = c(4:6, NA)) mymat <- matrix(data = c(100:102, 2*100:102), nrow = 3, ncol = 2, dimnames = list(paste0("row", 1:3), c("x", "y"))) mylist <- list(char.vect = char.vect, int.vect = int.vect, mydf = mydf, mymat = mymat) print(mylist) $char.vect [1] "a" "b" $int.vect [1] 1 2 3 $mydf x y 1 1 4 2 2 5 3 3 6 4 4 NA $mymat x y row1 100 200 row2 101 202 row3 102 204

Elements can be accessed by their name or index.

mylist$mymat["row2", "y"] [1] 202 mylist$mymat[2, 2] [1] 202 mylist[[4]][2, 2] [1] 202

The double square brackets are mandatory to access the root element of lists. Hence, mylist[4][2, 2] does not work:
Error in mylist[4][2, 2] : incorrect number of dimensions

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Shuanghe
★★

Spain,
2019-07-18 20:59
(2103 d 19:17 ago)

@ Helmut
Posting: # 20392
Views: 24,060

Spreadsheet addiction

Post reply

Dear all

❝ There is a series of papers about indirect adjusted comparisons in BE published by regulators. All done in Excel.

As far as I can tell, Excel and R give the same result. I was interested in the method so several weeks back I played with R using data from EPAR (none of the articles gives sample size of the BE study so without going to the EPAR for this additional information you can not reproduce/validate their results at all) and the difference in some comparison is 0.01% due to different rounding method used in Excel and R.

However, as Ohlbe said, there are some errors in the articles such as in the latest article (link), table 2 gives wrong data (I compared the original data from EPAR and it's clearly a mistake in article) but the result of the comparison using that data is correct, so this might just a typo. However, some comparisons were reversed, e.g., Table 3 describs generic 2 vs generic 4, and generic 3 vs generic 4, but in fact it were generic 4 vs 3 and 2, ... Apparently, someone typed the data in the wrong cell in excel so confidence interval were all reversed. :-D

❝ Semitransparent colors in Excel? I don’t think that’s possible. ;-)

Haha, this is where you err... :-D

Actually you can do that in Excel (not that you should). At least I can in my 2013. It's ugly, but semi-transparent it is. :cool:

—
All the best,
Shuanghe

Helmut
★★★

Vienna, Austria,
2019-07-18 22:04
(2103 d 18:12 ago)

@ Shuanghe
Posting: # 20395
Views: 24,013

Spreadsheet addiction

Post reply

Hi Shuanghe,

❝ I was interested in the method so several weeks back I played with R using data from EPAR (none of the articles gives sample size of the BE study so without going to the EPAR for this additional information you can not reproduce/validate their results at all) …

Yep. I wrote a package for this stuff. I think they were using Excel cause (a) everybody has it (b) of its flat learning curve and (c) – most important – Luther Gwaza* gave his “template” to the others.

❝ … and the difference in some comparison is 0.01% due to different rounding method used in Excel and R.

If you want to reproduce Excel’s wacky “commercial rounding”:

cround <- function(x, d) ((trunc(abs(x) * 10^d + 0.5)) / 10^d) * sign(x) x <- c(79.994, 79.995, 125.004, 125.005) comp <- data.frame(x = x, R = round(x, 2), Excel = cround(x, 2)) print(comp, row.names = FALSE) x R Excel 79.994 79.99 79.99 79.995 80.00 80.00 125.004 125.00 125.00 125.005 125.00 125.01

❝ ❝ Semitransparent colors in Excel? I don’t think that’s possible. ;-)

❝

❝ Haha, this is where you err... :-D Actually you can do that in Excel (not that you should). At least I can in my 2013. It's ugly, but semi-transparent it is. :cool:

Well, my version is Excel 2000. BTW, navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel to have the color dependent on the value (the lower, the darker)? :smoke:

Gwaza L. Adjusted Indirect Treatment Comparisons of Bioequivalence Studies. PhD Thesis: Utrecht. 2016. free resource.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Shuanghe
★★

Spain,
2019-07-19 14:41
(2103 d 01:35 ago)

@ Helmut
Posting: # 20400
Views: 23,839

OT: Spreadsheet addiction

Post reply

Hi Helmut,

❝ Yep. I wrote a package for this stuff.

Well, you're always several steps ahead of me. I just wrote some lousy function that serves the educational purpose (for myself). My output is for 4 methods only (Z-distribution, homo- and hetero-scedastic and pragmatic) since I'm still struggling with Chow and Shao/Liu's methods. By the way, I didn't see the package in your Github page. Any plan to put it there so we can have a peek? :-D

❝ If you want to reproduce Excel’s wacky “commercial rounding”:

❝ ...

I doubt I'll very use it but nice to know that it can be done. ;-)

❝ Well, my version is Excel 2000. BTW, navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel to have the color dependent on the value (the lower, the darker)? :smoke:

No way. Now you just want to piss off those Excel lovers :lol3:

❝

Gwaza L. Adjusted Indirect Treatment Comparisons of Bioequivalence Studies. PhD Thesis: Utrecht. 2016. free resource.

Thanks. I have many articles but I didn't know about the thesis. I'll take a look at it later.

ElMaestro, apology for the off-topic posting.

—
All the best,
Shuanghe

Helmut
★★★

Vienna, Austria,
2019-07-19 16:29
(2102 d 23:47 ago)

@ Shuanghe
Posting: # 20403
Views: 24,127

OT: Spreadsheet addiction

Post reply

Hi Shuanghe,

❝ ❝ Yep. I wrote a package for this stuff.

❝ Well, you're always several steps ahead of me.

Nope. I started on July 11^th.

❝ My output is for 4 methods only (Z-distribution, homo- and hetero-scedastic and pragmatic) …

Mine only for homo- and heteroscedastic variances (the latter is the package’s default). I think that relying on large sample approximations doesn’t make sense at all. The CI will always be narrower than with the other approaches but why would one go this way? Approximations of degrees of freedom for unequal variances were published in 1946/47 (Satterthwaite, Welch), the t-distribution in 1908 (Gosset a.k.a. Student). Going back more than two than two centuries (Laplace, Poisson, Gauß)? I don’t get it. BTW, I don’t like the pragmatic method.

❝ … since I'm still struggling with Chow and Shao/Liu's methods.

You need the raw data, right?

❝ […] I didn't see the package in your Github page. Any plan to put it there so we can have a peek? :-D

Paid job – chained by a CDA. If we will publish sumfink, sure. Not my decision.

❝ ❝ If you want to reproduce Excel’s wacky “commercial rounding”:

❝ ❝ ...

❝ I doubt I'll very use it but nice to know that it can be done. ;-)

I would never ever use it. However, if you want to reproduce sumfink which was rounded in Excel you can only experiment till you get the same in R.

❝ ❝ […] navigate up to my OP and hit ctrlF5 to clear the browser’s cache / reload. Is it possible in Excel?

❝

❝ No way. Now you just want to piss off those Excel lovers :lol3:

Yessir! R is a language and environment for statistical computing and graphics.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

nobody nothing 2019-07-19 17:53 (2102 d 22:23 ago) @ Helmut Posting: # 20404 Views: 23,732	OT: Spreadsheet addiction Post reply
	....reaaaaalllly boring in Vienna these days, huh? — Kindest regards, nobody

Helmut ★★★ Vienna, Austria, 2019-07-19 21:32 (2102 d 18:44 ago) @ nobody Posting: # 20406 Views: 23,755	OT: Spreadsheet addiction Post reply
	Hi nobody, ❝ ....reaaaaalllly boring in Vienna these days, huh? OT about is a welcomed excuse. All paid jobs absorb and degrade the mind. Aristotle — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes

zizou
★

Plzeň, Czech Republic,
2019-07-20 02:18
(2102 d 13:58 ago)

@ Ohlbe
Posting: # 20408
Views: 23,679

Spreadsheet etc. failures

Post reply

Hello everybody and nobody.

❝ Nope. I have examples of users using the wrong functions, or unlocked spreadsheets getting modified at some point, but that's about all. The problem is most often somewhere between the chair and the keyboard: wetware rather than software.

❝

❝ Well, there are some "features" of Excel you need to be aware of when using it for stats, of course. Already mentioned on the forum.

Right.

Recently, because of "floating point error" I failed to use spreadsheet for comparisons of differences of two similar values against certain value (my source data were almost all with two decimal places, but some of them were calculated with "full precision" and I wanted to easily check how many of differences are lower than 0.05, unfortunaly there was a case with the error).

See following examples:

| | Default View | Number Format with Cell | Formula | in General Format | Additional Decimal Places ──────┼──────────┼───────────────────┼─────────────────────────── A1 | 25.05 | | A2 | 25 | | A3 | =A1-A2 | 0.05 | 0.0500000000000007 A4 | =A3=0.05 | FALSE | A5 | =A3>0.05 | TRUE | Difference is higher than expected. | | Default View | Number Format with Cell | Formula | in General Format | Additional Decimal Places ──────┼──────────┼───────────────────┼─────────────────────────── A1 | 5.05 | | A2 | 5 | | A3 | =A1-A2 | 0.05 | 0.0499999999999998 A4 | =A3=0.05 | FALSE | A5 | =A3<0.05 | TRUE | Difference is lower than expected.

Nevertheless it is not problem of spreadsheet itself, but only precision issue. In R:

a=5.05 b=5 a-b #[1] 0.05 a-b==0.05 #[1] FALSE a-b<0.05 #[1] TRUE

Everyone knows, only smart machines don't.

Best regards,
zizou

Helmut
★★★

Vienna, Austria,
2019-07-20 04:12
(2102 d 12:04 ago)

@ zizou
Posting: # 20409
Views: 23,805

As designed ☺

Post reply

Hi zizou,

❝ Recently, because of "floating point error" I failed to use spreadsheet for comparisons of differences of two similar values against certain value (my source data were almost all with two decimal places, but some of them were calculated with "full precision" and I wanted to easily check how many of differences are lower than 0.05, unfortunaly there was a case with the error).

Haha, I know this game and therefore, always round before the comparison. ;-)

❝ Everyone knows, only smart machines don't.

Can’t be blamed. That’s a property of floating point arithmetic (see also IEEE 754). Unless you have powers of 2 (dec→bin: 1→1, 2→10, 4→100, 8→1000, …) you will always have precision issues in the last significant digits. Everything beyond the 15^th (dec-) significant figure is just noise – even with 64bit software on a 64bit OS.
Try this in Excel:
Cells A1–A9: 0, 1, … , 8
Cells B1–B9: =2^A1–A9
Cells C2–B9: =B2-B1, B3-B1, …
Format to 16 decimal places. What do you get?

Little bit deeper into [image]

a <- 5.05 b <- 5 c <- a - b identical(c, 0.05) [1] FALSE sprintf("%.17f", c) [1] "0.04999999999999982" a <- 4 b <- 2 c <- a - b identical(c, 2) [1] TRUE sprintf("%.17f", c) "2.00000000000000000" identical(c, 2L) [1] FALSE is.integer(c) [1] FALSE a <- 4L # or as.integer(4) b <- 2L # or as.integer(2) c <- a - b identical(c, 2) [1] FALSE sprintf("%.17f", c) "2.00000000000000000" identical(c, 2L) [1] TRUE

GNU Octave 5.1.1.0 (all calculations internally in 64bit double):

>> format long >> a = 5.05 a = 5.050000000000000 >> b = 5 b = 5 >> b - a ans = -4.999999999999982e-02 >> a = 4 a = 4 >> b = 2 b = 2 >> a - b ans = 2

Not even software for symbolic mathematics does better. Maxima 5.35.12.1 / wxMaxima 14.12.1:

(%i1) [a:5.05, b:0.5, a-b]; (%o1) [5.05,5,0.04999999999999982] (%i2) [a:4, b:2, a-b]; (%o2) [4,2,2]

Only for ne [image]

ds: Rmpfr. Doesn’t help. ;-)

library(Rmpfr) a <- mpfr(5.05, 256) b <- mpfr(5, 256) diff(c(b, a)) 1 'mpfr' number of precision 256 bits [1] 0.04999999999999982236431605997495353221893310546875 mpfr(a-b, 32) 1 'mpfr' number of precision 32 bits [1] 0.05 a <- mpfr(4, 256) b <- mpfr(2, 256) diff(c(b, a)) 1 'mpfr' number of precision 256 bits [1] 2

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2019-07-20 11:08
(2102 d 05:08 ago)

@ Helmut
Posting: # 20411
Views: 23,589

As designed ☺

Post reply

Hi Helmut,

❝ Haha, I know this game and therefore, always round before the comparison. ;-)

or never bring yourself into a situation where you need to compare floats.

I've taken the worst way out and written functions that check if x is simar to y plus/minus a wee faction like 1e-8 or something. It is ugly and clumsy, it works, and it feel everytime like I am suffering defeat.

"John Brown's got a little Endian" :-)

—
Pass or fail!
ElMaestro

Helmut
★★★

Vienna, Austria,
2019-07-20 14:53
(2102 d 01:22 ago)

@ ElMaestro
Posting: # 20412
Views: 24,098

To round or not to round…

Post reply

Hi ElMaestro,

❝ ❝ Haha, I know this game and therefore, always round before the comparison. ;-)

❝

❝ or never bring yourself into a situation where you need to compare floats.

Did I tell to how many significant digits I round? :-D

❝ I've taken the worst way out and written functions that check if x is simar to y plus/minus a wee faction like 1e-8 or something.

I don’t know which kind of measurements you are comparing. Sumfink ultra-precise (according to the International Bureau of Weights and Measures: t ±5×10^–16, l ±2.1×10^–11)?

Zizou (I guess) and I were talking about concentrations. AP according to the GLs 20% at the LLOQ and 15% above. Hence, everything reported substantially beyond that is not relevant (shall I call it noise?). What substantially means, depends on the field. Some people go crazy with 6σ, physicists are fine with just 3σ.

x <- pi p <- 15 s <- 8 mult <- c(3, 6) sigma <- x * p / 100 sigma.m <- sigma * mult lo <- x - sigma.m hi <- x + sigma.m df <- data.frame(x = rep(x, 2), prec = p, sigma = sigma, mult = mult, sigma.m = sigma.m, lo = lo, hi = hi) df <- signif(df, s) names(df)[2] <- "prec. (%)" print(df, row.names = FALSE) x prec. (%) sigma mult sigma.m lo hi 3.141593 15 0.4712389 3 1.413717 1.7278760 4.555309 3.141593 15 0.4712389 6 2.827433 0.3141593 5.969026

Do you get the idea? The true value might be somewhere between our self-imposed limits. Therefore, I don’t give a shit whether I get 3.14¹ or 3.141 592 653 589 793 in an electronic file. However, I insist in a CRC-checksum² to verify the data-transfer. If I deal with bloody Excel, I round to one decimal beyond what is given in the analytical report³ being aware that it is far beyond the analytical AP. If the data-transfer of analytical results to stats was done electronically in “full numeric precision” (haha), I want to see validation for it.⁴

❝ It is ugly and clumsy, it works, and it feel everytime like I am suffering defeat.

Well, you are the C-man here. What about printf("%.yg\n", x); where y is the desired number of significant digits? With [image]

’s signif():

options("digits" = 15) x <- 12345678.999999 y <- 0.12345678999999 prec <- 8 fmt1 <- "%33.17f" fmt2 <- paste0("%.", prec, "g") cat(x, "\n", y, "\n", sprintf(fmt1, x), "\u2190 fake news\n", sprintf(fmt1, y), "\u2190 fake news\n", sprintf(fmt1, signif(x, prec)), "\u2190", prec, "significant digits\n", sprintf(fmt1, signif(y, prec)), "\u2190", prec, "significant digits\n", sprintf(fmt2, x), "\u2190", "directly with", paste0("'", fmt2, "'\n"), sprintf(fmt2, y), "\u2190", "directly with", paste0("'", fmt2, "'\n")) 12345678.999999 0.12345678999999 12345678.99999899975955486 ← fake news 0.12345678999999000 ← fake news 12345679.00000000000000000 ← 8 significant digits 0.12345679000000000 ← 8 significant digits 12345679 ← directly with '%.8g' 0.12345679 ← directly with '%.8g'

If you are interested whether rubbish in ≈ rubbish out, ask for a checksum and verify it. Probably better than diving into the murky waters of (likely irrelevant) rounding.

I would be fine with just 3.1 for π. The RE is –1.32%, again much below what is achievable in bioanalytics.
SHA-256 or higher preferred (collisions reported for SHA-1, i.e., different input give the same hash). MD5 is better than nothing.
That’s the only GxP-compliant (dated/signed, released by QUA) document, right? Did you ever see result-tables with 15 significant digits?
Here it gets tricky. These data are different to what is given in the analytical report. Now what? At least I expect a statement about this discrepancy in the protocols (analytical and/or statistical). Regularly I see something like “calculations were performed in full numeric precision”. How could one ever hope to verify that having only the analytical report with rounded results?

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2019-07-20 21:59
(2101 d 18:17 ago)

@ Helmut
Posting: # 20415
Views: 23,963

To round or not to round…

Post reply

Hi Hötzi,

❝ I don’t know which kind of measurements you are comparing.

It stinks but there really is no good solution when you have to compare.


(294.1/1000) == 0.2941

[1] FALSE

(2941/10000) == 0.2941

[1] TRUE

One can read ISO standards and manuals about internal float representation (I tried andfell asleep), but at the end of the day if you need to compare you need to compare and hope for the best. Some numbers just cannot be represented accurately in binary.

If you accept an xyz fraction of error on the comparison then the solution is straightforward, like

Match = function (x,y, relative.tol)

{

 if (y<((1+relative.tol)*x))

  if (y>((1-relative.tol)*x))

    return(T)

 return(F)

}



Match(6, 6.001, 0.01)

Ugly as hell :vomit:

, and will not work well if you get into extreme binary representations.

—
Pass or fail!
ElMaestro

mittyri ★★ Russia, 2019-07-21 00:43 (2101 d 15:33 ago) @ ElMaestro Posting: # 20416 Views: 23,865	floating-point math is always more complex than you think it is Post reply
	Hi ElMaestro, ❝ `Match(6, 6.001, 0.01)` ❝ ❝ Ugly as hell , and will not work well if you get into extreme binary representations. take a look at `if (Abs(x - y) <= absTol * Max(1.0f, Abs(x), Abs(y)))` the description is here One may consider ULP, pros and cons described here — Kind regards, Mittyri

Ohlbe
★★★

France,
2020-12-10 19:43
(1592 d 19:33 ago)

@ ElMaestro
Posting: # 22125
Views: 18,735

Spreadsheet failures, any recent examples?

Post reply

Dear ElMaestro,

❝ does anyone have recent examples of spreadsheet functions truly misbehaving with real-life realistic data?

A colleague recently pointed to a strange behaviour in Excel (I have 2013, he has 2016, both give the same result here.

Type =0.5-0.4-0.1: you'll get the expected result, 0.

Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation?

Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

—
Regards
Ohlbe

Helmut
★★★

Vienna, Austria,
2020-12-10 20:12
(1592 d 19:04 ago)

@ Ohlbe
Posting: # 22126
Views: 18,711

Floating point arithmetic, again

Post reply

Dear Ohlbe,

❝ A colleague recently pointed to a strange behaviour in Excel (I have 2013, he has 2016, both give the same result here.

And I have 2000 (don’t laugh).

❝ Type =0.5-0.4-0.1: you'll get the expected result, 0.

Yep.

❝ Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation?

In principle, yes. Unless we could inspect the source code by M$, difficult to tell what happens with the brackets.

❝ Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

OK, write in A1 =0.5-0.4-0.1 and in B1 =A1=0

What do you expect?
What do you get?

Floating point arithmetic, again (see also this post above).
Try in A1 =16-8-4-2-2, in B1 =(16-8-4-2-2), and in C1 =A1=B1. I bet it “works” as expected, since these numbers can be converted to binary digits without error.

What about [image]

options("digits" = 16) a <- 0.5-0.4-0.1 b <- (0.5-0.4-0.1) a; b [1] -2.775557561562891e-17 [1] -2.775557561562891e-17 identical(a, b) [1] TRUE c <- 16-8-4-2-2 d <- (16-8-4-2-2) c; d [1] 0 [1] 0 identical(c, d) [1] TRUE

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

ElMaestro
★★★

Denmark,
2020-12-10 20:40
(1592 d 18:36 ago)

@ Helmut
Posting: # 22127
Views: 18,733

Floating point arithmetic, again

Post reply

Hi Ohlbe,

❝ Type =0.5-0.4-0.1: you'll get the expected result, 0.

❝ Now type =(0.5-0.4-0.1): you'll get -2,77556E-17. What ? Ain't that supposed to be the same calculation?

Yes it is (or may be) to you and to me, but that isn't how an electronic brain works.

❝ Trying =0=0.5-0.4-0.1: the answer I got was FALSE. WTF, if the answer you get is 0 ? It's not a matter of number of decimals displayed: I tried to add more to the first result, or to switch to scientific notation, but still got 0.

In the world of 1's and 0's you cannot represent all decimal numbers that are easily displayed in our common number format using digits of 0 to 9 ("decimal"). Your little equations exemplifies it wonderfully; it is not per se an Excel issue.

Here's R:

0==(0.5-0.4-0.1)

[1] FALSE

You can totally represent .5 (a half) in binary: It is actually .1 because it is 2 raised to the power of -1.
.11 in binary would be 2 raised to the power of -1 plus 2 raised to the power of -2. And so on.

But things are tricky for a figure like 0.1 (one tenth) - in binary you'd probably represent it as .00011001100110011001100110011001100110011001100110011... where the 0011 blocks repeat indefinitely.
In computers are data types have a maximum storage space, like 4 bytes or 8 bytes. So somewhere along the repeating 0011 blocks the computer will truncate the figure so it fits in the memory. And that is the source of your rounding issue.

Computer scientists know never to compare fractional numbers uncritically.
It stinks. The problem is not Excel, but that we have invested computers that use binary when at the same time we like to think of numbers represented with the usual 10 digits ("decimal"). :-)

—
Pass or fail!
ElMaestro

Ohlbe
★★★

France,
2020-12-10 21:18
(1592 d 17:58 ago)

@ ElMaestro
Posting: # 22129
Views: 18,767

Floating point arithmetic, again

Post reply

Hi ElMaestro,

❝ But things are tricky for a figure like 0.1 (one tenth) - in binary you'd probably represent it as .00011001100110011001100110011001100110011001100110011... where the 0011 blocks repeat indefinitely.

❝ In computers are data types have a maximum storage space, like 4 bytes or 8 bytes. So somewhere along the repeating 0011 blocks the computer will truncate the figure so it fits in the memory. And that is the source of your rounding issue.

Mmmm. But then if the problem comes from the nasty 0.1, how comes that if I type =(0.8-0.4-0.2-0.1-0.1) I get a nice 0, without any funky E-17 ? And then if I type =0=(0.8-0.4-0.2-0.1-0.1), or =0=0.8-0.4-0.2-0.1-0.1 I get a nice TRUE ?

—
Regards
Ohlbe

ElMaestro
★★★

Denmark,
2020-12-10 22:38
(1592 d 16:38 ago)

@ Ohlbe
Posting: # 22130
Views: 18,708

Floating point arithmetic, again

Post reply

Hi Ohlbe,

❝ Mmmm. But then if the problem comes from the nasty 0.1, how comes that if I type =(0.8-0.4-0.2-0.1-0.1) I get a nice 0, without any funky E-17 ? And then if I type =0=(0.8-0.4-0.2-0.1-0.1), or =0=0.8-0.4-0.2-0.1-0.1 I get a nice TRUE?

Impossible to tell if this is an effect from Excel or from some degree of streamlining within the CPU.
It is fairly trivial (I think) at the CPU level as well as in Excel to check if you are handing objects of the type (x-x) and to substitute those with a zero where they occur.

—
Pass or fail!
ElMaestro

Ohlbe ★★★ France, 2020-12-10 22:46 (1592 d 16:30 ago) @ ElMaestro Posting: # 22131 Views: 18,708	Floating point arithmetic, again Post reply
	Hi ElMaestro, ❝ Impossible to tell if this is an effect from Excel or from some degree of streamlining within the CPU. Same behaviour in : `0==(0.8-0.4-0.2-0.1-0.1)` [1] TRUE — Regards Ohlbe

ElMaestro
★★★

Denmark,
2020-12-10 23:05
(1592 d 16:11 ago)

@ Ohlbe
Posting: # 22132
Views: 18,676

Floating point arithmetic, again

Post reply

Hi Ohlbe,

0==(0.8-0.4-0.2-2*0.1)

[1] TRUE

0==(.1+0.8-0.4-0.2-3*0.1)

[1] FALSE

We (as in us humans) should just switch to binary and we'd never have these discussions. We can still keep the metric system but to keep things easy we may wish to define a millimeter as 1/1024 of a meter, if we wish to impress someone by our ability to do division and multiplication in our heads :-D

In a nutshell, try:


a=.11111111111111111111111111111111111111111111111111111111

b=.111111111111111111111111111111111111111111111111111111111

(a-b)==0

Edit: two posts merged. [Ohlbe]

—
Pass or fail!
ElMaestro

Ohlbe
★★★

France,
2020-12-11 00:35
(1592 d 14:41 ago)

@ ElMaestro
Posting: # 22135
Views: 18,667

Depressed

Post reply

❝ In a nutshell, try:

❝ a=.11111111111111111111111111111111111111111111111111111111

❝ b=.111111111111111111111111111111111111111111111111111111111

❝ (a-b)==0

ElMaestro, why do you do this to me? You know I have a natural distrust for many things and I see evil everywhere. How much did my shrink pay you, so that I would continue visiting him for another 10 years? :crying:

Update: just called my shrink, as I couldn't fall asleep. He reminded me that I had already seen this kind of stuff on this forum before, and had survived. He asked me whether I could think of a single situation where I would require this kind of precision in any calculation, else this could in some way affect my life, whether personal or professional. I had to admit that no. So we agreed he could go back to sleep and would send his bill in the morning.

He's a good shrink. A trifle expensive, but good.

—
Regards
Ohlbe

ElMaestro ★★★ Denmark, 2020-12-11 08:59 (1592 d 06:17 ago) @ Ohlbe Posting: # 22139 Views: 18,604	Depressed Post reply
	Hi Ohlbe, ❝ He's a good shrink. A trifle expensive, but good. Call me next time. I will say "Just forget about it, my friend" and it will only cost you, shall we say, 2700 euros per consultation? Cash payment preferred, but only if the latter been adequately laundered and sanitized (in that order). — Pass or fail! ElMaestro

Ohlbe
★★★

France,
2020-12-10 21:13
(1592 d 18:03 ago)

@ Helmut
Posting: # 22128
Views: 19,066

Floating point arithmetic, again

Post reply

Dear Helmut,

❝ OK, write in A1 =0.5-0.4-0.1 and in B1 =A1=0

❝ 1. What do you expect?

I expect A1 to read 0. Regarding B1: I would naturally expect it to return TRUE. But on the other hand, typing =0=0.5-0.4-0.1 returns FALSE, and I would not necessarily expect a two-step calculation to yield a result that differs from a one-step calculation. By the way I checked: =0=5-4-1 returns TRUE, so it is not that Excel just compares 0 to 0.5 and ignores the subsequent signs and figures.

❝ 2. What do you get?

A1 returns 0, as expected. Switching to scientific notation gives 0,00E+00. So it doesn't look like we have some hidden decimals there that don't show because of the default display.

B1 returns TRUE. Indeed the result I would normally expect. But this means that typing the comparison and the calculation in one cell, or the calculation in one cell and the comparison in another, gives different results.

❝ Try in A1 =16-8-4-2-2, in B1 =(16-8-4-2-2), and in C1 =A1=B1. I bet it “works” as expected, since these numbers can be converted to binary digits without error.

Yes. But it works also with A1 =5-4-1, in B1 =(5-4-1), and in C1 =A1=B1. So it works with 5, which is not a power of 2?

—
Regards
Ohlbe

Ohlbe
★★★

France,
2020-12-10 23:11
(1592 d 16:04 ago)

@ Ohlbe
Posting: # 22133
Views: 18,745

From bad to worse

Post reply

Hi fellows,

Sorry guys, but there are some things going back behind the scene, which I really fail to understand.

If I get -2,77556E-17 when I type =(0.5-0.4-0.1), how comes that I get 0 when I type =(-0.1-0.4+0.5)? The same figures, just in a different order?

If I get FALSE when I type =0=0.5-0.4-0.1, how comes that I get TRUE when I type =0=-0.4-0.1+0.5?

And to be fair to Excel, the same happens in [image]

0==(0.5-0.4-0.1)
[1] FALSE
0==(-0.4-0.1+0.5)
[1] TRUE

Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0?

—
Regards
Ohlbe

mittyri
★★

Russia,
2020-12-11 01:22
(1592 d 13:54 ago)

@ Ohlbe
Posting: # 22136
Views: 18,673

From bad to worse

Post reply

Dear Ohlbe,

❝ If I get -2,77556E-17 when I type =(0.5-0.4-0.1), how comes that I get 0 when I type =(-0.1-0.4+0.5) ? The same figures, just in a different order?

The order matters when you go to floating point arithmetic. ;-)

❝ If I get FALSE when I type =0=0.5-0.4-0.1, how comes that I get TRUE when I type =0=-0.4-0.1+0.5?

Rightyright. Why did it happen? The root cause is described here
Excel does do some additional operations under the hood.

❝ And to be fair to Excel, the same happens in [image] :

❝

0==(0.5-0.4-0.1)

❝ [1] FALSE

❝ 0==(-0.4-0.1+0.5)

❝ [1] TRUE

❝ Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0?

Yes, take a look at this MS article
Both Excel and R are following IEEE 754, not surprising that the conditions are giving the same results. The problem is only in values representation to the user.

—
Kind regards,
Mittyri

Helmut
★★★

Vienna, Austria,
2020-12-11 01:36
(1592 d 13:40 ago)

@ Ohlbe
Posting: # 22137
Views: 18,674

All is good

Post reply

Hi Ohlbe,

❝ … the same happens in [image] :

So you installed this goody at last? :-D

❝

0==(0.5-0.4-0.1)

❝ [1] FALSE

❝ 0==(-0.4-0.1+0.5)

❝ [1] TRUE

❝ Are they first calculating -0.4-0.1=-0.5, which is nicely binarily coded, then -0.5+0.5=0?

Would have to dive into the lexer of the source of base- [image]

. Likely not clever enough anyway. If you want to explore the matter, type ?Syntax, ?Arithmetic, ?Comparison, and their relatives. See also the (in)famous FAQ 7.31. However, the `==` operator should not be used to compare two objects for identity. Otherwise we fall into the binary trap:

x1 <- (0.5-0.4-0.1) x2 <- (-0.4-0.1+0.5) x1; x2 [1] -2.775558e-17 [1] 0 x1 == x2 [1] FALSE identical(x1, x2) [1] FALSE all.equal(x1, x2) [1] TRUE

Only the last function compares to the numeric precision of the machine* and this is what we want and the best we can hope for…

On my machine with 64bit …
sqrt(.Machine$double.eps) [1] 1.490116e-08
… which is orders of magnitude larger than abs(-2.775558e-17) – which is just “noise”. Hence, the test passes.
x1 <- (0.5-0.4-0.1) x2 <- (-0.4-0.1+0.5) sqrt(.Machine$double.eps) > abs(x1 - x2) [1] TRUE
That’s a crude example. The source is more tricky. Try getAnywhere(all.equal.numeric) to see the 77 lines of source code fo comparing numbers.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

PharmCat
★

Russia,
2020-12-18 21:53
(1584 d 17:23 ago)

@ Helmut
Posting: # 22150
Views: 18,371

Float is float

Post reply

Hi all,

This is very trivial story... let's take

a = 0.1

next float is:

julia> nextfloat(0.1)

0.10000000000000002

julia> nextfloat(0.1) - 0.1

1.3877787807814457e-17

float numbers is discrete... you can't use 0.10000000000000001 ...

you can use rational numbers:

julia> 5//10 - 4//10 - 1//10 == -4//10 - 1//10 + 5//10

true

or some packages for working with big decimals

julia> using Decimals

julia> decimal(0.5)-decimal(0.4)-decimal(0.1) == decimal(-0.4)-decimal(0.1)+decimal(0.5)

true

I think in R it can be done... and never use Excel for cacl...

edit: or you can do float(big(1//10)) if you just need more precision...

edit2:

❝ which is just “noise”

you should be very careful with eps() because this is not an indicator of accuracy

you can use something like:

a = 1e-70

b = 2e-70

b - a

end it will work fine, problems begans whe you do somesing like: 2.0 + 1e-70 , but you can still use 2.0 * 1e-70 without problems ...

make sure not to sum numbers with difference in eps() times ...

for comparation

all.equal(target, current, tolerance = sqrt(.Machine$double.eps))

can be used, but for small numbers you should adjust tolerance

Helmut
★★★

Vienna, Austria,
2020-12-21 00:27
(1582 d 14:49 ago)

@ PharmCat
Posting: # 22151
Views: 18,191

Float is float!

Post reply

Hi PharmCat,

❝ I think in R it can be done...

At least consistently giving not zero. ;-)

library(Rmpfr) prec <- 64 x <- mpfr(0.5, prec) - mpfr(0.4, prec) - mpfr(0.1, prec) x 1 'mpfr' number of precision 64 bits [1] -2.77555756156289135106e-17 y <- mpfr(-0.4, prec) - mpfr(0.1, prec) + mpfr(0.5, prec) y 1 'mpfr' number of precision 64 bits [1] -2.77555756156289135106e-17

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

mittyri
★★

Russia,
2020-12-21 14:49
(1582 d 00:27 ago)

@ Helmut
Posting: # 22152
Views: 18,172

rational solution in R

Post reply

Hi Helmut,

❝ At least consistently giving not zero. ;-)

❝

library(Rmpfr)

❝ prec <- 64

❝ x <- mpfr(0.5, prec) - mpfr(0.4, prec) - mpfr(0.1, prec)

❝ x

❝ 1 'mpfr' number of precision 64 bits

❝ [1] -2.77555756156289135106e-17

❝ y <- mpfr(-0.4, prec) - mpfr(0.1, prec) + mpfr(0.5, prec)

❝ y

❝ 1 'mpfr' number of precision 64 bits

❝ [1] -2.77555756156289135106e-17

I think the right solution is


install.packages("remotes")

remotes::install_github("bertcarnell/rational")

library(rational)

(rational(5L, 10L) - rational(4L, 10L) - rational(1L, 10L)) == 0

[1] TRUE

Interestingly but I did not find good solution using CRAN

—
Kind regards,
Mittyri

Helmut ★★★ Vienna, Austria, 2021-01-14 13:53 (1558 d 01:23 ago) @ mittyri Posting: # 22180 Views: 17,774	related stuff Post reply
	Hi mittyri, just discovered this goody. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes

SDavis ★★ UK, 2021-02-09 13:02 (1532 d 02:14 ago) @ Helmut Posting: # 22206 Views: 17,355	related stuff Post reply
	https://www.bbc.co.uk/news/technology-54423988 UK GOv and its repeated desire to use expensive and incompetent consultants.... — Simon Senior Scientific Trainer, Certara™ [link=https://www.youtube.com/watch?v=xX-yCO5Rzag[/link] https://www.certarauniversity.com/dashboard https://support.certara.com/forums/

ElMaestro
★★★

Denmark,
2021-02-09 20:55
(1531 d 18:21 ago)

@ SDavis
Posting: # 22207
Views: 17,298

related stuff

Post reply

Hihi,

thanks for that post SD,
I agree this is perhaps not a spreadsheet failure but rather a consultant failure.

Generally, I tell my clients they can have any two out of the golden three: Cheap, fast and good.
In particular, cheap and fast certainly won't be good.

(side note: when they use me as a consultant it won't be good regardless, but that's just an unrelated fact of life :-D

)

—
Pass or fail!
ElMaestro

Spreadsheet failures, any recent examples? [Software]

Spreadsheet failures, any recent examples?

Spreadsheet addiction

Nasty beast

Nasty beast

Nasty beast

Nasty beast

Decidedly off topic

OT: R limbo 101

Spreadsheet addiction

Spreadsheet addiction

OT: Spreadsheet addiction

OT: Spreadsheet addiction

OT: Spreadsheet addiction

OT: Spreadsheet addiction

Spreadsheet etc. failures

As designed ☺

As designed ☺

To round or not to round…

To round or not to round…

floating-point math is always more complex than you think it is

Spreadsheet failures, any recent examples?

Floating point arithmetic, again

Floating point arithmetic, again

Floating point arithmetic, again

Floating point arithmetic, again

Floating point arithmetic, again

Floating point arithmetic, again

Depressed

Depressed

Floating point arithmetic, again

From bad to worse

From bad to worse

All is good

Float is float

Float is float!

rational solution in R

related stuff

related stuff

related stuff