Skip to content

Unicode characters in data column names throw an error in naWhere #15

@drag05

Description

@drag05

I have the following data

> head(htc, 2)
      25 µL      50 µL     75 µL    100 µL  Accession
1: 1.265836 0.02575365 0.1428066 0.2107820 A0A024R6I7
2:       NA 0.01566025 0.1481060 0.2069585 A0A075B6K4

> dim(htc)
[1] 269   5

> htc[, colSums(is.na(.SD))]
    25 µL     50 µL     75 µL    100 µL Accession 
      200         0         3         0         0 

associated with these naWhere , varp and varn

> naWhere[1:4, ]
     25 µL 50 µL 75 µL 100 µL Accession
[1,] FALSE FALSE FALSE  FALSE     FALSE
[2,]  TRUE FALSE FALSE  FALSE     FALSE
[3,]  TRUE FALSE FALSE  FALSE     FALSE

> dim(naWhere)
[1] 269   5

> colSums(naWhere)
    25 µL     50 µL     75 µL    100 µL Accession 
      200         0         3         0         0 

> varp <- unique(unlist(vars))
> varp
[1] "50 μL"     "75 μL"     "100 μL"    "Accession" "25 μL"   ## maybe apply gtools::mixedsort ?

> varn
[1] "25 μL" "75 μL"

Calculating the leftout columns, throws the following error:

leftOut <- !varp %in% varn & colSums(naWhere[, varp]) > 0

"Error in naWhere[, varp] : subscript out of bounds"

Checking varp against colnames(naWhere):

identical(varp, colnames(naWhere))
FALSE

> intersect(varp, colnames(naWhere))
[1] "Accession"

> varp %in% colnames(naWhere)
[1] FALSE FALSE FALSE  TRUE FALSE

> which(varp %in% colnames(naWhere)) ## "Accession" only (FALSE)
[1] 4
> which(colnames(naWhere) %in% varp) ## "Accession" only (FALSE)
[1] 5

It seems to still be working when comparing varp against varn:

> !varp %in% varn
[1]  TRUE FALSE  TRUE  TRUE FALSE

The error seems to be caused by the presence of unicode characters in names although it seems to be no challenge for varp and varn , as shown by the last code line above. However,

using either seq_along or base::enc2native functions seems to remove the error:

leftOut <- !varp %in% varn & colSums(naWhere[, seq(along=varp)]) > 0

> leftOut
    25 µL     50 µL     75 µL    100 µL Accession 
     TRUE     FALSE      TRUE     FALSE     FALSE 

> varp = enc2native(varp)
> leftOut <- !varp %in% varn & colSums(naWhere[, varp]) > 0
> leftOut
    50 µL     75 µL    100 µL Accession     25 µL 
    FALSE      TRUE     FALSE     FALSE      TRUE 

Please advise, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions