How to extract and transform Excel data into Power BI?

As a data analyst, you need to import employee data from Excel into Power BI for reporting. In this tutorial we will dive into the basics of Power Query to extract and transform this data effectively, setting the stage for your analysis and visualization tasks.

Get data from Excel to Power BI

  1. Start by opening your Power BI Desktop and click “Get Data” in the Home ribbon.
  2. Select “Excel” from the list of data sources.
  3. Browse to and select your Excel file, then click “Open.”
  4. In the Navigator window, choose the sheet or table containing your data.
  5. Click “Transform Data” to open the Power Query Editor to review your data.

[Previous content remains the same]

Acquire Data from Excel Multiple Tabs

When working with Excel data, it’s common to have information spread across multiple Excel tabs. Here’s how to handle this in Power BI:

  1. In the Power Query Editor, click “New Source” > “Excel”.
  2. Select your Excel file and click “Open”.
  3. In the Navigator, you’ll see all available sheets and tables. Select multiple items by holding Ctrl and clicking each desired tab.
  4. Click “Transform Data” to load them into Power Query.

Each tab now appears as a separate query in the Queries pane. Now we need another quick step to combine them:

  1. Select one query as your base and click “Append Queries” in the Home tab.
  2. Choose “Append queries as new” to create a combined dataset.
  3. Select the queries to append and click “OK”.

If the tabs have different structures:

  1. Use “Transform Data” on each query to align column names and data types before appending.
  2. Add a “Source” column to each query to track data origin:
  • Go to “Add Column” > “Custom Column”
  • Name it “Source” and enter the tab name as the formula (e.g., “Employee List”)

This approach creates a unified dataset from multiple tabs, maintaining data traceability. Remember to handle any duplicates or conflicts that may arise from combining data sources

Troubleshooting

Read this section carefully as it contains key information to troubleshoot your data import.

  • If date columns import as text, use the Date.FromText() function in a custom column.
  • For merged Excel cells, use “Use First Row as Headers” and then “Promote Headers” to create proper column names.
  • If formulas in Excel aren’t importing correctly, consider copying and pasting as values in Excel before importing.
  • Large Excel files may slow down Power BI. Use column selection in the initial import to limit data or consider incremental refresh for production reports.

Transform your Excel data with Power Query

Now that your Excel file data is stored in your Power BI model, you can proceed with basic data cleansing and manipulation tasks.

  1. Remove Duplicates:
    • Select all columns and go to Home > Remove Rows > Remove Duplicates.
    • This ensures each employee appears only once in your dataset.
  2. Review and Rename Columns:
    • In the Power Query Editor, right-click each column header to rename if necessary.
    • Ensure names are clear and consistent, e.g., “Employee ID” to “EmployeeID”.
  3. Change Data Types:
    • Click the icon next to each column name to set the correct data type.
    • Set EmployeeID to Whole Number, Salary to Currency, and HireDate to Date.
  4. Remove Unnecessary Rows:
    • If your data has header rows, select “Use First Row as Headers” in the Home tab.
    • To remove empty rows, go to Home > Remove Rows > Remove Blank Rows.
  5. Handle Null Values:
    • For each column, go to Transform > Replace Values.
    • Replace null values with appropriate defaults (e.g., 0 for numeric fields, “Unknown” for text).

Once satisfied with your transformations, click “Close & Apply” in the Home tab to load the data into Power BI.

Key Concepts

  • Power Query transformations don’t affect the source Excel file.
  • The order of Applied Steps matters; arrange them logically for performance.
  • Use “View Native Query” to see the SQL-like representation of your transformations.
  • Leverage “Reference” instead of “Duplicate” when creating related queries to save memory.