Here’s a note from a reader:
“I have a large dataset in Power BI with thousands of rows, but I only need to keep specific rows based on certain criteria. For example, I want to keep only the bottom performers or rows with the first occurrence of each customer. How can I filter my data to keep just these specific rows?”
Understanding Row Filtering in Power BI
Filtering data to keep only specific rows is a fundamental data transformation task in your business intelligence work. Whether you’re analyzing sales performance, employee metrics, or customer behavior, you may often need to focus on particular segments of your data. This tutorial shows effective row filtering which helps reduce data noise, improves report performance, and enables more targeted analysis.
In this tutorial, we’ll use Power BI Desktop to demonstrate various techniques for keeping specific rows based on different criteria. These methods apply to both Power Query (M language) and DAX calculations, giving you flexibility in your approach depending on your specific knowledge or functional needs.
Filtering by value in Power BI and Power Query
For this tutorial, we’ll work with an HR dataset containing employee information. Our data model includes an “Employees” table with columns such as “EmployeeID”, “Name”, “Department”, “HireDate”, “Performance”, “Salary”, and “Region”. We’ll demonstrate how to keep specific rows based on various criteria like performance ratings, first occurrences, and sorting conditions.
Before starting, ensure you have Power BI Desktop installed and a basic understanding of the Power Query Editor. We’ll provide solutions using both M (Power Query) and DAX approaches where applicable.
Step-by-Step Implementation
Keeping Rows with Specific Values
Let’s start with a basic scenario: keeping only rows where employees have a “High” performance rating.
- Open your Power BI file and go to Power Query Editor by clicking “Transform data” on the Home tab.
- Select your “Employees” table and then use the filter dropdown on the “Performance” column.
- Select only the value “High” from the list and click OK.
- Alternatively, you can use M code by adding a custom step:
= Table.SelectRows(#"Previous Step", each [Performance] = "High")
Using DAX, you can create a calculated table:
HighPerformers =
FILTER(Employees, Employees[Performance] = "High")
Keeping Bottom N Rows
To keep only the bottom performers based on salary:
- In Power Query Editor, select the “Employees” table.
- Go to Home > Sort > Sort Ascending by the “Salary” column.
- Then go to Home > Keep Rows > Keep Bottom Rows and enter the number of rows you want to keep (e.g., 10).
- The M code for this operation would be:
= Table.FirstN(Table.Sort(#"Previous Step", {{"Salary", Order.Ascending}}), 10)
In DAX, you can create a calculated table:
BottomSalaries =
TOPN(10, Employees, Employees[Salary], ASC)
Keeping Rows with First Occurrence
A common requirement is to keep only the first occurrence of each value in a column. For example, keeping only the first employee from each department:
- In Power Query Editor, select the “Employees” table.
- Go to Home > Remove Rows > Remove Duplicates.
- Select only the “Department” column and click OK. This keeps the first occurrence of each department.
- The M code for this operation:
= Table.Distinct(#"Previous Step", {"Department"})
Using DAX, you can create a calculated table:
FirstInDepartment =
SUMMARIZE(
Employees,
Employees[Department],
"EmployeeID", MIN(Employees[EmployeeID]),
"Name", CALCULATE(MIN(Employees[Name])),
"HireDate", CALCULATE(MIN(Employees[HireDate])),
"Performance", CALCULATE(MIN(Employees[Performance])),
"Salary", CALCULATE(MIN(Employees[Salary])),
"Region", CALCULATE(MIN(Employees[Region]))
)
Keeping Rows Based on Multiple Conditions
To keep rows that meet multiple criteria, such as employees in the “Sales” department with “Medium” or “High” performance:
- In Power Query Editor, select the “Employees” table.
- Click on Add Column > Custom Column and create a column with this formula:
if [Department] = "Sales" and ([Performance] = "Medium" or [Performance] = "High") then true else false
- Name it “KeepRow” and click OK.
- Filter the “KeepRow” column to show only “true” values.
- Remove the “KeepRow” column when you’re done.
- The M code for this operation:
= Table.SelectRows(#"Previous Step", each ([Department] = "Sales" and ([Performance] = "Medium" or [Performance] = "High")))
In DAX, create a calculated table:
SalesPerformers =
FILTER(
Employees,
Employees[Department] = "Sales" &&
(Employees[Performance] = "Medium" || Employees[Performance] = "High")
)
FIltering All Rows Where a Column Contains Specific Text
To keep rows where the “Region” column contains “East”:
- In Power Query Editor, select the “Employees” table.
- Click on the filter dropdown for the “Region” column.
- Select “Text Filters” > “Contains” and enter “East”.
- The M code for this operation:
= Table.SelectRows(#"Previous Step", each Text.Contains([Region], "East"))
In DAX, create a calculated table:
EastRegionEmployees =
FILTER(Employees, SEARCH("East", Employees[Region], 1, 0) > 0)
Common Challenges and Solutions
Filter Not Working as Expected
If your filter isn’t working correctly, check for extra spaces or case sensitivity issues. In Power Query, you can use the Text.Trim and Text.Lower functions to standardize your text before filtering:
= Table.TransformColumns(#"Previous Step", {{"Region", each Text.Trim(Text.Lower(_)), type text}})
Then apply your filter on the cleaned text.
Losing Related Data When Filtering
When you filter rows in Power Query, you might lose relationships with other tables. To maintain relationships, consider creating a calculated table in DAX instead of filtering in Power Query, or ensure you properly recreate relationships after filtering.
Problem: Performance Issues with Large Datasets
Filtering large datasets can cause performance issues. Try these approaches:
- Apply filters as early as possible in your query steps
- Use query folding-compatible operations when possible
- Consider filtering data at the source level before importing
- For very large datasets, use DirectQuery mode with appropriate filters
Keeping Rows Based on Dates
When filtering by dates, be extra careful with date formats. Make sure to use proper date comparisons as the one shown below:
= Table.SelectRows(#"Previous Step", each [HireDate] >= #date(2020, 1, 1) and [HireDate] <= #date(2020, 12, 31))
In DAX:
Employees2020 =
FILTER(
Employees,
Employees[HireDate] >= DATE(2020, 1, 1) &&
Employees[HireDate] <= DATE(2020, 12, 31)
)
Keeping Rows with First Occurrence While Preserving Sort Order
To keep the first occurrence while maintaining a specific sort order. In this case to show the highest salary employee from each department.
= Table.Group(
Table.Sort(#"Previous Step", {{"Salary", Order.Descending}}),
{"Department"},
{{"AllData", each _, type table}}
)
= Table.ExpandTableColumn(#"Previous Step", "AllData", Table.ColumnNames(#"Sorted Rows"))