In this short tutorial we’ll examine how you can use Base R and the dplyr library to remove the last (or multiple n last columns) of your DataFrame.
Let’s start by building a simple R DataFrame from multiple vectors:
#R
city <- c ("NYC", "SFO", "LON", "PAR")
direct <- c(322, 327, 336, 324)
online <- c(191, 126, 196, 140)
partners <- c(319, 362, 325, 348)
revenue <- data.frame (city = city, direct = direct, online = online, partners = partners)
#get the DataFrame column names
print (names(revenue))
This will return:
[1] "city" "direct" "online" "partners"
Remove DataFrame last column with Base R
We can drop the last column with either of these snippets
new_rev <- revenue [1: ncol(revenue)-1 ]
Using the DataFrame length (which in R counts the number of columns – unlike in pandas – that counts the number of rows)
revenue <- revenue [1: length(revenue)-1 ]
Using the column name, as shown below:
revenue[!names(revenue) %in% 'partners']
If we now print the frame column names we’ll get only three columns:
print(names(revenue)) # will return [1] "city" "direct" "online"
Drop last columns with dplyr
Subset all columns but the last by column name
We can use the dplyr select function to select all columns but the last by its name. We then keep the result in a new DataFrame named new_rev.
library (dplyr)
new_rev <- revenue %>% select (-partners)
Or alternatively without using the pipe (%>%) notation:
library (dplyr)
new_rev <-select (revenue, -partners)
Remove the rightmost column by index
In a similar fashion we can use the column index:
library (dplyr)
new_rev <- select (revenue, -length(revenue))
Let’s check the columns in our new_rev DataFrame:
print( names(new_rev))
# this will return
[1] "city" "direct" "online"
Select the first column in R
Another related question in how select the first column off your DataFrame.
#return the first column using R-Base
first_col <- revenue [1]
You can accomplish the same using tidyverse and dplyr – we’ll use the select function:
library (dplyr)
first_col <- select (revenue, 1)
#or using the pipe notation
revenue %>% select (revenue, 1)
Delete last n columns from your DataFrame
In this last example, we’ll get an arbitrary N value representing the number of columns we would like to drop. We then create a sequence and pass it to the select function:
n = 2
new_rev <- revenue %>% select ( -seq (length(revenue)-n+1 ,length(revenue)) )