Often times after importing a DataFrame you’ll want to replace zero values to empty or missing (NA) values in your DataFrame column cells. In this tutorial we will look into several practical examples of this.
Create example R DataFrame
We will start by defining some fictitious data that includes several cells with missing values. Feel free to use it in order to follow along with this extensive tutorial.
# define vectors / columns
var1 <- c(12,11,11,22,33)
var2 <- c (321, 0,212,0,222)
var3 <- c (floor(var2/var1))
perf_df <- data.frame (var1 = var1, var2 = var2, var3 = var3)
# display the dataframe
print (perf_df)
Here’s the data:
var1 | var2 | var3 | |
---|---|---|---|
1 | 12 | 321 | 26 |
2 | 11 | 0 | 0 |
3 | 11 | 212 | 19 |
4 | 22 | 0 | 0 |
5 | 33 | 222 | 6 |
Replace 0 values to empty in R DataFrame
There are several ways to accomplish this.
Using R-Base: We use a boolean matrix that we obtain by searching for all zero values. We then use it slice the DataFrame and assign the NA (empty / missing) value to each cell.
perf_df[perf_df == 0] <- NA
With dplyr: We use the na_if function that allows us to allocate the NA value to certain cells matching a criteria – in this case equal to zero
library ("dplyr")
perf_df <- na_if(perf_df,0)
Important Note: Make sure to install dplyr in your RStudio app before calling it from your R script.
After running either of these commands we’ll get the following DataFrame:
1 | 12 | 321 | 26 |
2 | 11 | NA | NA |
3 | 11 | 212 | 19 |
4 | 22 | NA | NA |
5 | 33 | 222 | 6 |
Convert zero values to empty in a column with dplyr
We can use a similar process in order to replaces NA values in one specific column.
With R Base:
perf_df$var2 [perf_df$var2==0] <- NA
With dplyr na_if:
library ("dplyr")
perf_df <- na_if(perf_df['var2'],0)
Using the dplyr mutate function:
library ("dplyr")
perf_df %>% mutate (var2 = na_if(var2, 0))
All will return the following column:
var2 1 321 2 NA 3 212 4 NA 5 222
Replace zeros in multiple columns with dplyr
In case that we would like to replace 0 values in multiple columns, we can write a bit more complex mutate function:
library ("dplyr")
perf_df %>% mutate (var2 = na_if(var2, 0),
var3 = na_if(var3, 0)