Bug: SetDecimalSeparator() Makes DECIMAL Values None
Introduction
Hey guys, let's dive into a critical bug we've uncovered in mssql-python
concerning the setDecimalSeparator()
function. If you're using this function, especially in scenarios involving bulk data fetching, you need to be aware of this issue. The main keyword here is setDecimalSeparator()
bug, which causes DECIMAL
values to unexpectedly become None
. This article will break down the bug, how to reproduce it, the expected behavior, and a proposed solution. So, buckle up and let's get started!
Describe the Bug
The crux of the matter is that when you use setDecimalSeparator()
to switch the decimal separator from the default .
(period) to another character (like ,
for European locales), DECIMAL
values fetched via bulk fetch operations such as fetchall()
or fetchmany()
can turn into None
. Yes, you heard that right β your data is silently disappearing! This issue primarily affects the bulk/columnar fetch path in the C++ layer. Interestingly, single-row fetches via fetchone()
work just fine because they use a different code path. The intention of setDecimalSeparator()
was solely to influence display formatting (in Row.__str__()
), but the bulk fetch implementation is mistakenly applying it during the parsing of numeric values from the database. This is a classic case of a function doing more than it should, leading to unexpected consequences. The critical finding is that this bug specifically targets the bulk/columnar fetch path, making it crucial to understand how your data retrieval methods can impact your results. Itβs like having a hidden gremlin in your code, silently wreaking havoc on your numerical data when you least expect it.
No exceptions or errors are raised, making this a particularly insidious issue. Instead of getting the Decimal('1234.56')
you expect, you get None
. Imagine the debugging nightmare! The primary impact of this bug is on data integrity. When DECIMAL
values are silently converted to None
, it leads to inaccurate data representation, potentially skewing calculations and reports. This can have far-reaching consequences, especially in financial applications or any scenario where precise numerical data is essential. The lack of an error message compounds the problem, making it difficult to detect the issue early on. This silent data corruption can lead to incorrect business decisions, flawed analyses, and ultimately, a loss of trust in the system's data accuracy.
How to Reproduce the Bug
The key here is that the bug manifests only with bulk fetch operations (fetchall()
, fetchmany()
). Single-row fetches via fetchone()
remain unaffected. To demonstrate this, consider the following code snippet:
from mssql_python import connect, setDecimalSeparator, getDecimalSeparator
import os
from decimal import Decimal
conn_str = os.getenv("DB_CONNECTION_STRING")
conn = connect(conn_str)
cursor = conn.cursor()
# Create temp table with multiple rows
cursor.execute("DROP TABLE IF EXISTS #test_decimal")
cursor.execute("""
CREATE TABLE #test_decimal (
id INT,
price DECIMAL(10, 2)
)
""")
# Insert test data
for i in range(10):
cursor.execute(f"INSERT INTO #test_decimal VALUES ({i}, {1234.56 + i})")
# Test 1: Bulk fetch with default separator works
print("Test 1: Bulk fetch with default separator")
setDecimalSeparator('.')
cursor.execute("SELECT id, price FROM #test_decimal ORDER BY id")
rows = cursor.fetchall()
print(f"Fetched {len(rows)} rows")
print(f"First row: {rows[0][1]}") # Output: Decimal('1234.56') β
print(f"Type: {type(rows[0][1])}")
# Test 2: Bulk fetch with comma separator - BUG!
print("\nTest 2: Bulk fetch with comma separator")
setDecimalSeparator(',')
cursor.execute("SELECT id, price FROM #test_decimal ORDER BY id")
rows = cursor.fetchall()
print(f"Fetched {len(rows)} rows")
print(f"First row: {rows[0][1]}") # Output: None β BUG!
print(f"Type: {type(rows[0][1])}")
print(f"Expected: Decimal('1234.56')")
print(f"Got: {rows[0][1]}")
# Test 3: fetchone() works correctly (different code path)
print("\nTest 3: fetchone() with comma separator (works!)")
setDecimalSeparator(',')
cursor.execute("SELECT CAST(9876.54 AS DECIMAL(10, 2)) AS price")
row = cursor.fetchone()
print(f"Result: {row[0]}") # Output: Decimal('9876.54') β
print(f"Type: {type(row[0])}")
# Cleanup
cursor.execute("DROP TABLE #test_decimal")
cursor.close()
conn.close()
When you run this, you'll notice that Test 2, which uses setDecimalSeparator(',')
and fetchall()
, results in None
values. This clearly demonstrates the bug. The above code provides a complete, self-contained example that anyone can run to reproduce the issue. This is crucial for bug reporting and verification. By providing a clear and concise example, you make it easier for others to understand the problem and for developers to implement a fix. The code is structured to highlight the contrast between the correct behavior with the default separator and the buggy behavior with a custom separator. This direct comparison makes the issue more apparent and underscores the importance of the bug.
Expected Output:
Test 1: Bulk fetch with default separator
Fetched 10 rows
First row: 1234.56
Type: <class 'decimal.Decimal'>
Test 2: Bulk fetch with comma separator
Fetched 10 rows
First row: None
Type: <class 'NoneType'>
Expected: Decimal('1234.56')
Got: None
Test 3: fetchone() with comma separator (works!)
Result: 9876.54
Type: <class 'decimal.Decimal'>
Expected Behavior
Ideally, DECIMAL
values should always be correctly parsed from the database, irrespective of the decimal separator setting. The database returns strings like "1234.56"
(with a period), and parsing should consistently use .
as the decimal separator. The setDecimalSeparator()
setting should exclusively affect the display format when converting Row
objects to strings via __str__()
. This means that the actual value should remain a valid Decimal
object, while its string representation can use the custom separator. To be more specific, the correct behavior ensures that the internal representation of numerical data remains accurate, regardless of how itβs displayed. This separation of concerns is a fundamental principle in software design, and this bug violates that principle.
For instance:
setDecimalSeparator(',')
row = cursor.fetchone()
print(row[0]) # Should print: Decimal('1234.56') - actual value
print(str(row)) # Should print: (1234,56) - formatted display
This illustrates the separation between the actual value and its display format. The print(row[0])
should always output the correct Decimal
value, while print(str(row))
should respect the custom decimal separator for display purposes. This consistency is crucial for maintaining data integrity and providing a predictable user experience.
Root Cause Analysis
The root cause lies within mssql_python/pybind/ddbc_bindings.cpp
, specifically at lines 3243-3253 in the bulk/columnar fetch path. This path is utilized by fetchall()
and fetchmany()
for optimized batch fetching. The issue is that the code incorrectly applies the custom separator before parsing the numeric value, leading to parsing errors. The debugger can be your best friend here, allowing you to step through the code and observe the values at each stage. The key is to trace the execution path for both the buggy bulk fetch and the working single-row fetch to pinpoint the exact location where the divergence occurs. This hands-on approach often provides deeper insights than simply reading the code.
Buggy Code (lines 3235-3260):
case SQL_DECIMAL:
case SQL_NUMERIC: {
try {
std::string numStr(reinterpret_cast<const char*>(
&buffers.charBuffers[col - 1][i * MAX_DIGITS_IN_NUMERIC]),
buffers.indicators[col - 1][i]);
// BUG: Applies custom separator BEFORE parsing
std::string separator = GetDecimalSeparator();
if (separator != ".") {
// Replaces '.' with custom separator (e.g., ',')
size_t pos = numStr.find('.');
if (pos != std::string::npos) {
numStr.replace(pos, 1, separator); // "1234.56" β "1234,56"
}
}
// BUG: Passes invalid string to Python's Decimal()
// Python's Decimal("1234,56") throws exception β caught β returns None
row.append(py::module_::import("decimal").attr("Decimal")(numStr));
} catch (const py::error_already_set& e) {
LOG("Error converting to decimal: {}", e.what());
row.append(py::none()); // Silently returns None on parse error
}
break;
}
The code snippet above highlights the problematic section where the custom separator is applied before parsing. This leads to the Decimal()
constructor receiving an incorrectly formatted string, resulting in a parsing error and the subsequent return of None
. The crucial insight here is that the code is modifying the string representation of the number before passing it to the Decimal()
constructor. This is where the core issue lies, as the Decimal()
constructor expects a string with a period as the decimal separator. By replacing the period with a comma (or another custom separator), the string becomes invalid for parsing, leading to the None
result.
The fetchone()
function works because the non-bulk fetch path (lines 2616-2670) bypasses this flawed logic. It passes the string directly to Decimal()
without any modification. This discrepancy in behavior between the two fetch methods is what makes the bug particularly challenging to diagnose. Understanding the different code paths for fetchone()
and fetchall()
is essential for grasping why the bug manifests only in specific scenarios.
Files Affected:
mssql_python/pybind/ddbc_bindings.cpp
lines 3243-3253 (bulk fetch - BUGGY)mssql_python/pybind/ddbc_bindings.cpp
lines 2616-2670 (single fetch - CORRECT)mssql_python/row.py
lines 152-169 (display formatting - CORRECT)
Impact and Workarounds
This bug is critical because it can cause silent data loss. Users who set the decimal separator for European locales (,
) risk receiving None
values instead of accurate DECIMAL
data when using fetchall()
or fetchmany()
. This can lead to calculation errors and application failures. The silent nature of the data corruption is particularly concerning, as it can go unnoticed for extended periods, leading to potentially significant consequences.
The immediate workaround is to stick with the default .
separator (avoid calling setDecimalSeparator()
) or exclusively use fetchone()
(though this isn't practical for large datasets). These workarounds are, at best, temporary solutions. They restrict the functionality of the library and can impact performance. A proper fix is essential to ensure the library functions as expected across different locales and data volumes.
Proposed Solution
The solution is to align the bulk fetch path with the non-bulk fetch path by parsing the DECIMAL value using the default .
separator and ensuring setDecimalSeparator()
only affects display formatting. We need to correct the code in ddbc_bindings.cpp
to prevent it from modifying the string representation before parsing. The parsing should always assume a period as the decimal separator, as this is the format returned by the database.
Changes Required in ddbc_bindings.cpp
Location: Lines 3235-3260 (bulk fetch path)
Current buggy code:
case SQL_DECIMAL:
case SQL_NUMERIC: {
try {
std::string numStr(reinterpret_cast<const char*>(
&buffers.charBuffers[col - 1][i * MAX_DIGITS_IN_NUMERIC]),
buffers.indicators[col - 1][i]);
// BUG: Don't modify string before parsing!
std::string separator = GetDecimalSeparator();
if (separator != ".") {
size_t pos = numStr.find('.');
if (pos != std::string::npos) {
numStr.replace(pos, 1, separator);
}
}
row.append(py::module_::import("decimal").attr("Decimal")(numStr));
} catch (const py::error_already_set& e) {
LOG("Error converting to decimal: {}", e.what());
row.append(py::none());
}
break;
}
Fixed code (remove separator replacement):
case SQL_DECIMAL:
case SQL_NUMERIC: {
try {
std::string numStr(reinterpret_cast<const char*>(
&buffers.charBuffers[col - 1][i * MAX_DIGITS_IN_NUMERIC]),
buffers.indicators[col - 1][i]);
// FIX: Always parse with '.' separator (database format)
// Custom separator only affects display in Row.__str__()
row.append(py::module_::import("decimal").attr("Decimal")(numStr));
} catch (const py::error_already_set& e) {
LOG("Error converting to decimal: {}", e.what());
row.append(py::none());
}
break;
}
By removing the separator replacement logic, we ensure that the Decimal()
constructor always receives a string formatted with a period as the decimal separator. This fixes the parsing error and ensures that the correct Decimal
value is returned.
Implementation Steps
- Remove lines 3243-3253 in
ddbc_bindings.cpp
(the separator replacement logic). - Keep the display formatting in
row.py
(lines 152-169) β it's already correct. - Add test cases for bulk fetch with custom decimal separators.
- Verify that both
fetchone()
andfetchall()
work consistently.
These steps outline a clear and concise plan for fixing the bug. By removing the problematic code, preserving the correct display formatting, and adding comprehensive test cases, we can ensure that the fix is effective and doesn't introduce any regressions.
Expected Behavior After Fix
setDecimalSeparator(',')
cursor.execute("SELECT CAST(1234.56 AS DECIMAL(10,2))")
# Data retrieval: Always returns proper Decimal object
rows = cursor.fetchall()
print(rows[0][0]) # Decimal('1234.56') β
print(type(rows[0][0])) # <class 'decimal.Decimal'> β
# Display formatting: Uses custom separator
print(str(rows[0])) # (1234,56) β
- display only
After applying the fix, the code should behave as expected, with Decimal
values being correctly parsed and the custom separator only affecting the display format. This ensures data integrity and provides a consistent user experience across different locales.
Testing Checklist
- [ ]
fetchone()
with default separator (already works) - [ ]
fetchone()
with custom separator (already works) - [ ]
fetchall()
with default separator (already works) - [ ]
fetchall()
with custom separator (currently broken, will fix) - [ ]
fetchmany()
with custom separator (currently broken, will fix) - [ ]
Row.__str__()
display formatting with custom separator (already works) - [ ] Cross-platform testing (Windows, Linux, macOS)
This testing checklist ensures that the fix is thoroughly validated across different scenarios and platforms. By covering all the critical use cases, we can have confidence that the bug is resolved and the library is functioning correctly.
Conclusion
So, there you have it, guys! A deep dive into a tricky bug in mssql-python
's setDecimalSeparator()
function. This bug, which causes DECIMAL
values to become None
during bulk fetch operations when a non-default decimal separator is set, can lead to silent data loss. We've explored how to reproduce it, the expected behavior, the root cause in the C++ layer, and a proposed solution. By removing the incorrect separator replacement logic in the bulk fetch path, we can ensure that DECIMAL
values are parsed correctly, regardless of the decimal separator setting. Remember, the key takeaway is that setDecimalSeparator()
should only affect display formatting, not the underlying data parsing. Itβs crucial to address issues like this to maintain data integrity and ensure the reliability of your applications. Stay tuned for updates and be sure to apply the fix when it's released. Happy coding, and watch out for those sneaky bugs!