To group R DataFrame data by month proceed as following:
- Define your R DataFrame.
- Group by your data according to one or multiple columns.
- For each group, summarize our metric according to the relevant function (mean, max, sum, median, first, last, count)
Here’s the R code to use:
import (tidyverse) my_df %>% group_by (month= floor_date(category_col , 'month')) %>% summarise(your_kpi = sum(metric_col))
Grouping R data by month – Practical Example
We’ll start by importing the tidyverse library , or alternatively open lubridate and dplyr. We then define a couple of vectors and then create a simple R DataFrame.
library (lubridate) library (dplyr) # define two vectors stamps <- ymd ('2023-12-28', '2023-12-29', '2024-01-01', '2024-01-02', '2024-01-03') hires <- c (4, 8, 5, 4, 5) # initialize data table campaign <- data.frame (stamps = stamps, hires=hires)
Aggregate R data by month
To group data by month, use the following code:
campaign %>% group_by (month= floor_date(stamps, 'month')) %>% summarise(num_hires = sum(hires))
We use the lubridate floor_data function to aggregate the data, we then use the dplyr summarise function to define an aggregating measure, in our case a sum of all hires by month.
Your RStudio console will display the summarized data as a tibble:
# A tibble: 2 × 2 month num_hires 1 2023-12-01 12 2 2024-01-01 14
Count R data by month
In the same way, we can aggregate occurrences of an observation and display them in a monthly frequency. Note that in order to count distinct observations we use the dplyr n() function:
campaign %>% group_by (month= floor_date(stamps, 'month')) %>% summarise(count_hires = n())
This will return the following tibble:
month count_hires <date> <int> 1 2023-12-01 2 2 2024-01-01 3
Plot R data by month
What if after aggregating our data, i would like to display the summary in a bar chat? To do that, we will use the ggplot2 library and render the sum of hires by month in a column plot:
campaign %>% group_by (month= floor_date(stamps, 'month')) %>% summarise(num_hires = sum(hires))%>% ggplot(aes (x=month, y=num_hires)) + geom_col() + labs(title="Hires by Month", x = "Month", y="Hires") + theme_minimal()
Here’s our bar plot:
Group data by year
In the same fashion, we can aggregate and display data by year. Assuming we are still using our campaign DataFrame, we can use the following snippet:
campaign %>% group_by (year= floor_date(stamps, 'year')) %>% summarise(hires_per_year = sum(hires))