ElMaestro ★★★ Denmark, 2016-12-17 20:40 (3050 d 09:22 ago) Posting: # 16850 Views: 5,630 |
|
Hi all, I am struggling with import of a data from a file having a structure like: ![]() Now, I would like R to understand that 1. we are not dealing with factors, the data frame is a series of numeric vectors. 2. Whenever we encounter something non-numeric such as Apple and Banana we will set it to zero. So I have this solution: A=read.table("SomeFile.csv", header=T) A is now effectively a matrix, or we can as.matrix or matrix it directly. And although it works, it is painfully unreadable and it also throws warnings (coercion). I am looking for a solution that is much more readable and not throwing warnings (and we are not going to discuss suppression of warnings, please, it sends shivers down my spine to do it). Regarding readability: I am fine with a solution that takes more coding/more lines, so this is not a competition in doing a lot of stuff in a single line. Single line = more elegance, to some = less readbility to a simpleton like me. Can anyone help here? Many thanks. — Pass or fail! ElMaestro |
mittyri ★★ Russia, 2016-12-18 02:41 (3050 d 03:22 ago) @ ElMaestro Posting: # 16851 Views: 4,745 |
|
Hi ElMaestro, I'd like A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",") But more readable would be A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",") Enjoy! — Kind regards, Mittyri |
ElMaestro ★★★ Denmark, 2016-12-18 12:47 (3049 d 17:16 ago) @ mittyri Posting: # 16854 Views: 4,693 |
|
Hi Mittyri, ❝ ❝ apply(A, 2, function(x){ ❝ !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x), 0))}) ❝ ❝ But more readable would be ❝ ❝ checkandreplace <- function(x){ ❝ isnotnumeric <- !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x) ❝ # we need to replace them with 0 ❝ replaced0 <- replace(x, isnotnumeric, 0) ❝ # and to convert it all to numeric ❝ return(as.numeric(replaced0)) ❝ } ❝ apply(A, 2, checkandreplace) My eyes! My eyes! It hurts. ![]() ![]() — Pass or fail! ElMaestro |
wligtenberg ☆ The Netherlands, 2016-12-19 15:29 (3048 d 14:34 ago) @ ElMaestro Posting: # 16856 Views: 4,659 |
|
I am a fan of the data.table package. So my answer would be this: library(data.table) You need to do it in two steps, else the class stays character. |
zizou ★ Plzeň, Czech Republic, 2016-12-19 19:41 (3048 d 10:22 ago) @ ElMaestro Posting: # 16857 Views: 4,655 |
|
Dear ElMaestro. I think there should be a warning message when you have apples and bananas in your data. x) Function as.numeric() produces Warning message because there is no numeric value in case of "Banana" (etc.) resulting in NA. If you expect numeric values with some text known before you can use: A=read.table("SomeFile.csv", colClasses="numeric", na.strings=c("Banana","Apple"), header=T) If colClasses set to numeric and there is a string not mentioned in na.strings (e.g. "Orange"), then there will be an error in function read.table(). If Bananas and Apples should be managed in different way, I would read the data as character (no specification of colClasses). Then change e.g. "Banana" for "0" and "Apple" for NA and change type to numeric using as.numeric(). text=c("Apple","Banana") If "Orange" will be also in text/data, it will be treated as NA by coercion with the warning message. In such data with numeric+character values in one variable I would store it as character only (with no statistics or whatever). The principle of change all string values (in many of numeric values) to zero without warnings seems strange to me to me (simple example is to have a wrong value 1,25 in the data instead of correct value 1.25). It seems Ok. to have a warning about changing the value to NA. But of course, when all data are correct and there are many of string values which should be a zero, the warning is unwished. To avoid supressing using suppressWarnings(as.numeric(text)) it must be done as mittyri wrote, i.e. replace all character values to "0" before using the function as.numeric(). (Nevertheless the result is similar, the values of type "1,25" or "1.25*" will be changed to zero without warnings.) Best regards, zizou No one likes warnings even though they can help! |
ElMaestro ★★★ Denmark, 2016-12-19 22:06 (3048 d 07:56 ago) @ zizou Posting: # 16859 Views: 4,597 |
|
Dear zizou, ❝ If you expect numeric values with some text known before you can use: ❝ Thanks. Actually, what I have are vectors of times for plasma samples, and vectors with corresponding concentrations. Can use that to extract a bunch of info, including AUCt and Cmax and much else. Sometimes unquantified (unquantifiable) concentrations for a variety of reasons are marked like "BLQ", "BLQ/2", "?", "???", "NA", "Missing", "M", "Note", "UNQ", and any other variation you can think of in English, German, Swahili, French, Inuit and 56 other languages. I may be exaggerating ever so slightly, what I am trying to say is that it is a wee bit open-ended with those non-numeric strings. ![]() — Pass or fail! ElMaestro |