Nasty beast [Software]

posted by Helmut Homepage – Vienna, Austria, 2019-07-18 22:26 (1705 d 10:40 ago) – Posting: # 20396
Views: 13,781

Dear Ohlbe,

❝ I made multiple attempts with "if" and conditions, and got multiple different error messages. ElMaestro gave me some hints, resulting each time in a "oh yes of course" reaction and more error messages... I probably misinterpreted the hints.


Using if(), its relatives, and loop-constructs (for(), while(), repeat()) in many cases are prone to errors – as you experienced – and can be slooow. If you have a vector of data (say y) there a various ways to access its values. Example with 25 random integers in the range 1…40:

n  <- 25
y  <- round(runif(n = n, min = 1, max = 40))
th <- 10             # threshold
n  <- length(y)      # we know but useful later
head(y, 10)          # the first 10
y[1:10]              # same
tail(y, 10)          # the last 10
y[(n - 9):n]         # same but tricky
blq <- which(y < th) # the ones are below the threshold
length(blq)          # how many?
y[blq]               # show them
y[y < th]            # same

In such a simple case which() is an overkill, though one can include many conditions which [sic] makes the code more easy to understand. If you have two vectors (say x, y) like in my last post, you can select values of x depending on which(y = condition).
Let’s check:

n    <- 50
x    <- runif(n = n, min = 1, max = 20)
a    <- 0.5
b    <- 2
y    <- a + b * x + rnorm(n = length(x),  mean = 0, sd = 2)
th   <- 10

fun1 <- function(x, y, th) { # clumsy
  x.th <- numeric()
  y.th <- numeric()
  bql  <- 0L
  for (k in seq_along(x)) {
    if (y[k] < th) {
      bql       <- bql + 1
      x.th[bql] <- x[k]
      y.th[bql] <- y[k]
    }
  }
  z <- data.frame(x.th, y.th)
  return(invisible(z))
}

fun2 <- function(x, y, th) { # better
  blq  <- which(y < th)
  x.th <- x[blq]
  y.th <- y[blq]
  z    <- data.frame(x.th, y.th)
  return(invisible(z))
}

fun3 <- function(x, y, th) { # a little bit confusing for beginners
  x.th <- x[y < th]
  y.th <- y[y < th]
  z    <- data.frame(x.th, y.th)
  return(invisible(z))
}

res1 <- fun1(x, y, th)
res2 <- fun2(x, y, th)
res3 <- fun3(x, y, th)
identical(res1, res2); identical(res1, res3) # same?
[1] TRUE
[1] TRUE

Bingo! Which one is easier to read?
What about speed?

library(microbenchmark)
res <- microbenchmark(fun1(x, y, th),
                      fun2(x, y, th),
                      fun3(x, y, th), times = 1000L)
print(res, signif = 4)
Unit: microseconds
           expr   min    lq  mean median    uq  max neval cld
 fun1(x, y, th) 173.9 181.1 196.1  186.3 189.9 1530  1000   b
 fun2(x, y, th) 163.0 170.3 183.6  175.4 179.0 3311  1000  a
 fun3(x, y, th) 163.0 169.0 185.5  174.5 178.4 3310  1000  ab

Practically the same, since vectors are short. If we play this game with longer vectors fun2() shows its strengths.

[image]


If one has more conditions any external construct will suck. We have 100 random integers (1…50) and want to get the even ones between 20 and 30 in increasing order.

x <- round(runif(n = 100, min = 1, max = 50))
x
  [1] 37 29  8 34  6 13 47 22 30 44  6 14 33 18 12 37 32
 [18] 10  6 37 27  2 43 40  7  5 47  4 32 17  7 50 39 36
 [35] 38 48 34  5 21 43 34 50 29 20 33  6 45 32 28  8  1
 [52] 26 29 19 42  9 38 31 25  4  1 23 37 31  2 26 29 24
 [69] 40 43 17 16 41 17  5 17 36 16  7  5 36 30  5  8 19
 [86] 40 42 30 33 21 13 25 21 33 16  7 33 36 19 37

One-liner:

sort(x[which(x >= 20 & x <= 30 & x %%2 == 0)])
[1] 20 22 24 26 26 28 30 30 30

Or even shorter:

sort(x[x >= 20 & x <= 30 & x %%2 == 0])
[1] 20 22 24 26 26 28 30 30 30

Good luck coding that with a loop and a nested if() { do this } else { do that } elseif { oops }.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
22,940 posts in 4,812 threads, 1,640 registered users;
39 visitors (0 registered, 39 guests [including 5 identified bots]).
Forum time: 08:07 CET (Europe/Vienna)

Those people who think they know everything
are a great annoyance to those of us who do.    Isaac Asimov

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5