How to group DataFrame columns in R by month?

To group R DataFrame data by month proceed as following:

  1. Define your R DataFrame.
  2. Group by your data according to one or multiple columns.
  3. For each group, summarize our metric according to the relevant function (mean, max, sum, median, first, last, count)

Here’s the R code to use:

import (tidyverse)
my_df %>%
  group_by (month= floor_date(category_col , 'month')) %>%
  summarise(your_kpi = sum(metric_col))

Grouping R data by month – Practical Example

Create DataFrame

We’ll start by importing the tidyverse library , or alternatively open lubridate and dplyr. We then define a couple of vectors and then create a simple R DataFrame.

library (lubridate)
library (dplyr)

# define two vectors
stamps <- ymd ('2023-12-28', '2023-12-29', '2024-01-01', '2024-01-02',
            '2024-01-03')
hires <- c (4, 8, 5, 4, 5)

# initialize data table
campaign <- data.frame (stamps = stamps, hires=hires)

Aggregate R data by month

To group data by month, use the following code:

campaign %>%
  group_by (month= floor_date(stamps, 'month')) %>%
  summarise(num_hires = sum(hires))

We use the lubridate floor_data function to aggregate the data, we then use the dplyr summarise function to define an aggregating measure, in our case a sum of all hires by month.

Your RStudio console will display the summarized data as a tibble:

# A tibble: 2 × 2
  
       month      num_hires
        
1 2023-12-01        12
2 2024-01-01        14

Count R data by month

In the same way, we can aggregate occurrences of an observation and display them in a monthly frequency. Note that in order to count distinct observations we use the dplyr n() function:

campaign %>%
  group_by (month= floor_date(stamps, 'month')) %>%
  summarise(count_hires = n())

This will return the following tibble:

month      count_hires
  <date>           <int>
1 2023-12-01           2
2 2024-01-01           3

Plot R data by month

What if after aggregating our data, i would like to display the summary in a bar chat? To do that, we will use the ggplot2 library and render the sum of hires by month in a column plot:

campaign %>%
  group_by (month= floor_date(stamps, 'month')) %>%
  summarise(num_hires = sum(hires))%>%
  ggplot(aes (x=month, y=num_hires))  + 
  geom_col() + 
  labs(title="Hires by Month", x = "Month", y="Hires") +
  theme_minimal()

Here’s our bar plot:

Group data by year

In the same fashion, we can aggregate and display data by year. Assuming we are still using our campaign DataFrame, we can use the following snippet:

campaign %>%
  group_by (year= floor_date(stamps, 'year')) %>%
  summarise(hires_per_year = sum(hires))