How to extract numbers from text strings in Power BI?

Use case

I have a column in my database in which i have an employee IDs that is mixed with text (e.g., ‘EMP12345’ or ‘EMPLOYEE12345’ for example). The text length is not fixed. How can I extract just the numeric part in Power BI for further analysis and visualization?

Introduction

Extracting numeric data from text strings is a common challenge in data preparation. Power BI offers robust tools to handle this task efficiently, both in Power Query and using DAX in Power BI desktop. This post explores how to solve this problem using an test dataset as an example.

Step-by-step Implementation

1. Power Query Solution

a. Open Power Query Editor
b. Select the EmployeeID column
c. Add a custom column with the following M code:

= Text.Select([EmployeeID], {"0".."9"})

d. Rename the new column to “NumericEmployeeID”
e. Change the data type of “NumericEmployeeID” to Whole Number

Note: we always prefer to manipulate data as close as possible to the data source (SQL / Excel / Data Lakes etc’), hence using Power Query for data cleansing is advised.

2. DAX Solution

If the option of using Power Query is not viable for your data conversion, you can also manipulate your data with DAX:

a. Create a new calculated column in Power BI Desktop
b. Use the following DAX formula:

NumericEmployeeID = 
CALCULATE(
    MAX(
        VALUE(
            SUBSTITUTE(
                SUBSTITUTE(EmployeeData[EmployeeID], "EMP", ""),
                "emp", ""
            )
        )
    )
)

Remember

  • Text.Select in M: Filters characters based on a condition
  • SUBSTITUTE in DAX: Replaces specified text with another
  • VALUE in DAX: Converts text to a number

What can go wrong?

  1. Inconsistent formats: If some IDs don’t follow the “EMP12345” pattern, modify the M code to handle variations:
   = Text.Select([EmployeeID], {"0".."9"})
  1. Performance issues with large datasets: Consider extracting numbers in Power Query instead of DAX for better performance.
  2. Blank results: Verify that all EmployeeID values contain numbers. Add error handling in DAX using IFERROR if needed.