## Nasty beast [Software]

Dear Ohlbe,

» I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints.

Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors – as you experienced – and can be slooow. If you have a vector of data (say y) there a various ways to access its values. Example with 25 random integers in the range 1…40:

n  <- 25 y  <- round(runif(n = n, min = 1, max = 40)) th <- 10             # threshold n  <- length(y)      # we know but useful later head(y, 10)          # the first 10 y[1:10]              # same tail(y, 10)          # the last 10 y[(n - 9):n]         # same but tricky blq <- which(y < th) # the ones are below the threshold length(blq)          # how many? y[blq]               # show them y[y < th]            # same

In such a simple case which() is an overkill, though one can include many conditions which [sic] makes the code more easy to understand. If you have two vectors (say x, y) like in my last post, you can select values of x depending on which(y = condition).
Let’s check:

n    <- 50 x    <- runif(n = n, min = 1, max = 20) a    <- 0.5 b    <- 2 y    <- a + b * x + rnorm(n = length(x),  mean = 0, sd = 2) th   <- 10 fun1 <- function(x, y, th) { # clumsy   x.th <- numeric()   y.th <- numeric()   bql  <- 0L   for (k in seq_along(x)) {     if (y[k] < th) {       bql       <- bql + 1       x.th[bql] <- x[k]       y.th[bql] <- y[k]     }   }   z <- data.frame(x.th, y.th)   return(invisible(z)) } fun2 <- function(x, y, th) { # better   blq  <- which(y < th)   x.th <- x[blq]   y.th <- y[blq]   z    <- data.frame(x.th, y.th)   return(invisible(z)) } fun3 <- function(x, y, th) { # a little bit confusing for beginners   x.th <- x[y < th]   y.th <- y[y < th]   z    <- data.frame(x.th, y.th)   return(invisible(z)) } res1 <- fun1(x, y, th) res2 <- fun2(x, y, th) res3 <- fun3(x, y, th) identical(res1, res2); identical(res1, res3) # same?  TRUE  TRUE

Bingo! Which one is easier to read?

library(microbenchmark) res <- microbenchmark(fun1(x, y, th),                       fun2(x, y, th),                       fun3(x, y, th), times = 1000L) print(res, signif = 4) Unit: microseconds            expr   min    lq  mean median    uq  max neval cld  fun1(x, y, th) 173.9 181.1 196.1  186.3 189.9 1530  1000   b  fun2(x, y, th) 163.0 170.3 183.6  175.4 179.0 3311  1000  a  fun3(x, y, th) 163.0 169.0 185.5  174.5 178.4 3310  1000  ab 

Practically the same, since vectors are short. If we play this game with longer vectors fun2() shows its strengths. If one has more conditions any external construct will suck. We have 100 random integers (1…50) and want to get the even ones between 20 and 30 in increasing order.

x <- round(runif(n = 100, min = 1, max = 50)) x    37 29  8 34  6 13 47 22 30 44  6 14 33 18 12 37 32   10  6 37 27  2 43 40  7  5 47  4 32 17  7 50 39 36   38 48 34  5 21 43 34 50 29 20 33  6 45 32 28  8  1   26 29 19 42  9 38 31 25  4  1 23 37 31  2 26 29 24   40 43 17 16 41 17  5 17 36 16  7  5 36 30  5  8 19   40 42 30 33 21 13 25 21 33 16  7 33 36 19 37

One-liner:

sort(x[which(x >= 20 & x <= 30 & x %%2 == 0)])  20 22 24 26 26 28 30 30 30

Or even shorter:

sort(x[x >= 20 & x <= 30 & x %%2 == 0])  20 22 24 26 26 28 30 30 30

Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }.

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes Ing. Helmut Schütz 