How to change zero values to NA in R DataFrames with dplyr?

Often times after importing a DataFrame you’ll want to replace zero values to empty or missing (NA) values in your DataFrame column cells. In this tutorial we will look into several practical examples of this.

Create example R DataFrame

We will start by defining some fictitious data that includes several cells with missing values. Feel free to use it in order to follow along with this extensive tutorial.

# define vectors / columns

var1 <- c(12,11,11,22,33)
var2 <- c (321, 0,212,0,222)
var3 <- c (floor(var2/var1))

perf_df <- data.frame (var1 = var1, var2 = var2, var3 = var3)

# display the dataframe
print (perf_df)

Here’s the data:

var1var2var3
11232126
21100
31121219
42200
5332226

Replace 0 values to empty in R DataFrame

There are several ways to accomplish this.

Using R-Base: We use a boolean matrix that we obtain by searching for all zero values. We then use it slice the DataFrame and assign the NA (empty / missing) value to each cell.

perf_df[perf_df == 0] <- NA

With dplyr: We use the na_if function that allows us to allocate the NA value to certain cells matching a criteria – in this case equal to zero

library ("dplyr")
perf_df <- na_if(perf_df,0)

Important Note: Make sure to install dplyr in your RStudio app before calling it from your R script.

After running either of these commands we’ll get the following DataFrame:

11232126
211NANA
31121219
422NANA
5332226

Convert zero values to empty in a column with dplyr

We can use a similar process in order to replaces NA values in one specific column.

With R Base:

perf_df$var2 [perf_df$var2==0] <- NA

With dplyr na_if:

library ("dplyr")
perf_df <- na_if(perf_df['var2'],0)

Using the dplyr mutate function:

library ("dplyr")
perf_df %>% mutate (var2 = na_if(var2, 0))

All will return the following column:

  var2
1  321
2   NA
3  212
4   NA
5  222

Replace zeros in multiple columns with dplyr

In case that we would like to replace 0 values in multiple columns, we can write a bit more complex mutate function:

library ("dplyr")
perf_df %>% mutate (var2 = na_if(var2, 0),
                                   var3 = na_if(var3, 0)