Config Management System Architecture: A Deep Dive
Hey guys! Let's dive deep into the architecture discussion around building a robust configuration management system. This article summarizes our conversation regarding a system designed as an alternative to Azure App Configuration, focusing on key features like eliminating redeployment requirements, enabling schema validation, and centralizing configuration management.
Core Problem Statement: Why a New System?
At the heart of our discussion lies the need for a better way to manage application configurations. Currently, there are a few pain points we're aiming to address:
- Eliminating Redeployment Requirements: The main goal is to make configuration changes without the hassle of redeploying applications. This is a big one, guys, as redeployments can be time-consuming and disruptive.
- Enabling Schema Validation: Octopus Deploy, while powerful, lacks built-in schema validation. We need a system that can ensure our configurations adhere to a predefined structure, preventing errors and inconsistencies.
- Centralizing Configuration Management: We want a single source of truth for all application configurations, simplifying management and improving overall consistency.
To tackle these challenges, we've designed a system with a specific data model and configuration resolution logic, which we'll explore in detail.
Data Model Architecture: The Building Blocks
To achieve our goals, we've structured the system around several key entities, each playing a crucial role in managing application configurations. Let's break down the data model architecture:
Applications: The Top-Level Entities
Applications are the cornerstone of our system. Think of them as the representation of your deployed software – for example, the DICOM Router. Each application is linked to a JSON schema, which acts as a contract defining the valid configuration keys for that application. This schema is super important because it allows us to enforce consistency and prevent applications from being misconfigured. Moreover, applications can share common schema components, like database connection strings or message queue configurations. This promotes reusability and reduces redundancy across your applications.
Variables (Key-Value Pairs): The Configuration Data
The core of our configuration data is stored in variables, which are essentially key-value pairs. The keys themselves are defined by the application schemas we just talked about. The values, on the other hand, can take two forms: they can be direct configuration values (like a number of retries) or references to external secret managers such as Azure Key Vault or AWS Secrets Manager. This indirection is a security best practice because it allows us to store sensitive information outside of our configuration system. Now, here's a crucial point: Variables aren't directly tied to applications. Instead, they're associated with tags, which brings us to the next piece of the puzzle.
Tags: Categorizing Configurations
Tags are labels that categorize configurations. Imagine tags like "Europe", "LatinAmerica", "Production", or "Test". They add a layer of organization and flexibility to our system. The magic of tags lies in how they're used within tag groups. We organize tags into mutually exclusive tag groups. This means that a configuration can only have one tag from each tag group. For example, if we have a "Region" tag group, a configuration can be tagged with either "Europe" or "LatinAmerica", but not both. This restriction helps prevent conflicts and ensures clarity in our configuration management. Tags are a game-changer for managing configurations across different environments and regions.
Tag Groups: Organizing Tags
Tag Groups are collections of mutually exclusive tags, like we just discussed. They serve two critical purposes:
- Mutual Exclusivity Enforcement: This is the core reason for tag groups. By ensuring that only one tag from a group can be applied to a configuration, we prevent situations where, for example, an API key might have conflicting region tags (like being tagged as both "Europe" and "LatinAmerica").
- Organizational Structure: Tag groups also provide a logical way to group related tags. For instance, we might have a tag group called "Region" that includes tags for various geographical regions, or an "Environment" tag group with tags like "Production", "Test", and "Staging".
Examples of tag groups:
- Region: Europe, LatinAmerica, Oceania, etc.
- Environment: Production, Test, Staging, etc.
API Keys: Authentication and Authorization
To ensure that only authorized applications can access their configurations, we use API Keys. Each deployed application instance gets a unique API key generated within our system. These keys are then pasted into Octopus Deploy variables for deployment. API keys can have multiple tags attached, allowing us to tailor configurations to specific application instances. However, because of the mutual exclusivity of tag groups, an API key can only have one tag per tag group. This might sound complex, but it provides a very powerful and flexible way to manage access to configurations. Think of API keys as the gatekeepers of your configuration data.
Crucially, API keys don't need to have tags from every group – you can selectively assign tags based on the needs of the application instance. Finally, each API key is associated with exactly one application.
Configuration Resolution Logic: How It All Comes Together
Now that we've defined the data model, let's talk about how the system actually determines which configuration values to use for a given application instance. This is where our configuration resolution logic comes into play.
Predetermined Tag Precedence Order: No More Guesswork
Instead of relying on manual label ordering, like in Azure App Configuration, our system uses a fixed, predetermined hierarchy for tag precedence. This is a key advantage because it eliminates the potential for human error and ensures consistent configuration resolution across all applications.
The order is as follows:
- Global variables (base layer, no tags): These are the default settings that apply to all applications.
- More specific tag overrides in predetermined order (e.g., Region → Environment → Feature flags): Tags allow us to override the global settings with more specific configurations. For example, we might have region-specific configurations that override the global defaults, followed by environment-specific configurations that further refine the settings.
This predetermined order creates an automatic fallback chain. If a region-specific value doesn't exist, the system automatically falls back to the global value. This makes our system more resilient and easier to manage. It also means you can spend less time debugging configuration issues and more time building awesome features.
Orthogonal Tag Groups: Flexibility and Power
Our tag groups are designed to be orthogonal, which means they can be independent and combine in various ways. This orthogonality gives us a lot of flexibility in how we manage our configurations. For example:
- Region tags (Europe, LatinAmerica) can control geography-specific configurations, such as database connection strings or API endpoints.
- Environment tags (Production, Test) can control cross-regional behaviors like logging levels or feature flags.
API keys can have tags from multiple groups that work together. This allows us to create highly targeted configurations for specific application instances. For instance, an API key might be tagged with both "Europe" (from the Region group) and "Production" (from the Environment group), resulting in a configuration that is specific to the European production environment.
Schema Management: Ensuring Configuration Integrity
Schema management is a crucial aspect of our configuration system. It helps us ensure that our configurations are valid and consistent, preventing errors and improving overall system reliability.
Application Schemas: Defining the Configuration Contract
As we discussed earlier, each application has a JSON schema that defines the valid configuration keys. These schemas are the backbone of our validation process. They act as a contract between the application and the configuration system, specifying what settings are allowed and what data types they should have. Schemas also support shared components, allowing us to reuse common configurations across multiple applications. This promotes consistency and reduces the effort required to manage configurations.
Schema Validation: Preventing Invalid Configurations
The system performs schema validation at two key points:
- When configurations are created or updated: This ensures that only valid configurations are stored in the system.
- At query time: When an application requests its configuration, the system filters the variables based on the application's schema. This ensures that the application only receives configurations that it supports, preventing unexpected errors.
Configuration Editor Features: Making Configuration Management Easier
To make configuration management even easier, we've designed a configuration editor with several helpful features:
- Application-first workflow: Users select an application before editing its configuration. This makes it clear which configuration is being modified and reduces the risk of errors.
- Schema-driven autocomplete: The editor suggests valid keys based on the selected application's schema. This helps users discover available configurations and prevents typos.
- Validation: The editor prevents the creation of invalid keys, ensuring that only valid configurations are saved.
Integration Points: Connecting to the Ecosystem
A configuration management system doesn't exist in isolation. It needs to integrate with other parts of the development and deployment ecosystem. Here are the key integration points for our system.
Octopus Deploy Integration: Dynamic Configuration Updates
We generate API keys in our system and add them to Octopus Deploy variables. During deployment, Octopus injects the appropriate API key into each application instance. This integration is what enables dynamic configuration updates without redeployment. When a configuration change is made in our system, the application instance can retrieve the updated configuration using its API key, without requiring a new deployment.
External Secret Managers: Securely Managing Secrets
Our system stores references or pointers to secrets, not the actual secret values. This is a crucial security feature. We support popular secret managers like Azure Key Vault and AWS Secrets Manager. Applications resolve secret references at runtime, retrieving the secret values directly from the secret manager. This keeps sensitive data out of our configuration system entirely, reducing the risk of a security breach.
Key Advantages Over Current System: A Recap
Let's recap the key advantages of our new configuration management system over the current approach:
- No redeployment needed for configuration changes: This is a huge time-saver and reduces the risk of downtime.
- Schema validation built-in (vs. Octopus being "blind" to schemas): This prevents configuration errors and improves system reliability.
- Centralized management of all configurations: This simplifies management and improves consistency.
- Predetermined precedence eliminates ordering errors: This ensures consistent configuration resolution.
- Flexible tagging supports multi-dimensional organization: This allows us to create highly targeted configurations.
- Autocomplete and validation in the editor: This makes configuration management easier and less error-prone.
- Security through external secret references rather than storing sensitive values: This protects sensitive data and reduces the risk of a security breach.
Geographic Deployment Support: Scaling Globally
Our system is designed for worldwide deployment with region-specific configurations. We can support different environments in various regions, such as:
- Latin America environment
- Europe (Benelux) environment
- Australia/Oceania environment
Each region can have different database connections, credentials, and settings. This geographic deployment support is essential for applications that need to scale globally.
Conclusion: A Robust and Flexible Solution
This design provides a robust, flexible, and secure alternative to Azure App Configuration while addressing the specific pain points we identified in our current Octopus Deploy-based workflow. By implementing this configuration management system, we can significantly improve our development and deployment processes, reduce the risk of errors, and enhance the overall reliability of our applications. What do you guys think about these improvements? Let me know your thoughts!