Fix: Power Query Column Duplication Bug After Refresh
Hey guys! Ever run into a weird bug where your Power Query duplicates columns after a refresh? Yeah, super annoying, right? Especially when you're dealing with connection-only queries. Let's dive into what was causing this issue and how we squashed it!
Bug Description
So, here's the deal. Imagine you've got an LLM (or a user) rocking a Power Query in connection-only mode. You do the following:
- Refresh the query using
pq-refresh. - Try to load it to a table using
pq-loadtoorSetLoadToTableAsync.
Boom! What happens? A whole mess of problems:
- ❌ Columns are duplicated. You get multiple identical columns where you should only have one.
- ❌ Data loads as a plain range instead of the proper QueryTable.
- ❌ Multiple QueryTables might be created when you should only have one.
This can really mess up your data and your workflow. Let's dig into how to reproduce it.
Reproduction Steps
To recreate this lovely bug, just follow these steps:
- Create a connection-only Power Query with a single column.
- Call
RefreshAsync()on that query. - Use
SetLoadToTableAsync()to load the data to your worksheet. - Observe the horror: duplicated columns and a plain range instead of the expected QueryTable.
Now that we know how to make it happen, let's talk about why it was happening in the first place.
Root Cause
The culprit lies within the code, specifically in SetLoadToTableAsync and potentially in UpdateAsync. Let's break it down.
SetLoadToTableAsync (PowerQueryCommands.LoadConfig.cs)
Original Code (STEP 3 cleanup):
// Delete ALL existing QueryTables on sheet (not query-specific)
dynamic? queryTables = targetSheet.QueryTables;
for (int i = queryTables.Count; i >= 1; i--)
{
dynamic? qt = queryTables.Item(i);
qt?.Delete();
ComUtilities.Release(ref qt);
}
// THEN called usedRange.Clear() - THIS WAS THE BUG!
usedRange.Clear(); // ❌ Harmful when no data exists yet
The Problem:
- The code was calling
usedRange.Clear()before creating the QueryTable. This is like wiping the canvas clean before you even start painting! - For connection-only queries (where there's no existing data yet), this created a "dirty" state in Excel.
- When the QueryTable was finally created, Excel didn't initialize it properly. It was all messed up!
- The result? Data loaded as a regular range, and those pesky columns got duplicated.
UpdateAsync (PowerQueryCommands.Lifecycle.cs)
Current Code Pattern:
// Uses RemoveQueryTablesFromSheet (query-specific deletion)
PowerQueryHelpers.RemoveQueryTablesFromSheet(targetSheet, queryName);
// ALSO calls usedRange.Clear()
usedRange.Clear(); // ⚠️ May have same issue
Status: The usedRange.Clear() call here is still under investigation. Removing it causes test failures, which means columns start accumulating. More on this later!
So, now that we understand the root cause, let's talk about how we fixed it.
Solution Implemented
SetLoadToTableAsync Fix
Changed to:
// Delete ONLY QueryTables for this specific query (consistent with UpdateAsync)
PowerQueryHelpers.RemoveQueryTablesFromSheet(targetSheet, queryName);
// NO usedRange.Clear() - let Excel handle data cleanup
// QueryTable deletion is sufficient
Benefits:
- ✅ Now, it only removes QueryTables for the specific query, not all of them.
- ✅ We're letting Excel handle the data cleanup, which it's much better at.
- ✅ No more interference with QueryTable creation.
- ✅ It's consistent with the
UpdateAsyncpattern.
Basically, we stopped messing with things we shouldn't have been messing with! This leads to a more stable and predictable result.
Test Coverage Enhanced
We also beefed up the tests to make sure this bug stays squashed. Here's what we did:
Bug_RefreshConnectionOnlyThenSetLoadToTable_CreatesRangeInsteadOfTable
We added assertions to verify:
- ✅ A QueryTable exists (not just a regular range).
- ✅ There's exactly 1 QueryTable (no duplicates allowed!).
- ✅ No more duplicate columns.
Update_QueryColumnStructure_UpdatesWorksheetColumns
We added assertions after each UpdateAsync to verify:
- ✅ The QueryTable still exists (hasn't been converted to a range).
- ✅ There's exactly 1 QueryTable (still no duplicates!).
- ✅ The column count is correct (no sneaky accumulation).
Basically, we're making sure that the QueryTable stays a QueryTable, and that we don't end up with a pile of extra columns.
Files Changed
Here's a list of the files that got a makeover:
src/ExcelMcp.Core/Commands/PowerQuery/PowerQueryCommands.LoadConfig.cs-SetLoadToTableAsyncrefactored.tests/ExcelMcp.Core.Tests/Integration/Commands/PowerQuery/PowerQueryLoadToTableTests.cs- Enhanced test assertions.tests/ExcelMcp.Core.Tests/Integration/Commands/PowerQuery/PowerQueryCommandsTests.cs- Added QueryTable checks.
Testing
Good news, everyone! All tests are passing:
- ✅
Bug_RefreshConnectionOnlyThenSetLoadToTable_CreatesRangeInsteadOfTable- 21s - ✅
Update_QueryColumnStructure_UpdatesWorksheetColumns- 22s
Related Questions
Now, a couple of lingering questions to ponder:
-
Should
UpdateAsyncalso removeusedRange.Clear()?- Currently,
usedRange.Clear()is still hanging out inUpdateAsync. - Evidence suggests that removing it causes test failures (columns accumulate – yikes!).
- The hunch is that it shouldn't be necessary, but...
- ...it needs more digging.
- Currently,
-
Why the different behavior between
SetLoadToTableAsyncandUpdateAsync?SetLoadToTableAsynccontext: Connection-only, so there's no existing data.UpdateAsynccontext: There's already a QueryTable with data.- Does
QueryTable.Delete()behave differently depending on the context? That's the question.
Impact
This fix has some pretty sweet impacts:
- LLM workflows: Connection-only → Refresh → LoadToTable now actually works correctly. Huzzah!
- User workflows: No more duplicated columns or plain ranges messing up your spreadsheets.
- Consistency:
SetLoadToTableAsyncnow uses the same pattern asUpdateAsync(usingRemoveQueryTablesFromSheet).
So, there you have it! A deep dive into a tricky bug and how we squashed it. Hopefully, this helps you avoid similar headaches in the future! Happy querying!