Lambda Pipeline: Strategy, Portfolio, And Execution
Hey folks! 👋 Let's dive into a cool project: splitting up a monolithic pipeline into a set of lean, mean Lambda functions, all orchestrated by the power of AWS Step Functions. We're talking about refactoring our current system, specifically focusing on the strategy_v2
, portfolio_v2
, and execution_v2
modules. The goal? To build a more flexible, scalable, and resilient system. This approach should offer some real benefits, especially in terms of fault isolation, traceability, and the ability to scale each component independently. We'll be keeping things efficient by leveraging the existing Shared
codebase for common logic, state handling, and infrastructure interfaces. It's all about making things better, you know?
The Current Landscape and Why We're Changing Things Up
Currently, our architecture runs as a monolithic pipeline. Picture this: it's an in-memory event bus with synchronous Python execution. That's how our strategy_v2
, portfolio_v2
, and execution_v2
modules work together in a linear daily pipeline. First, we generate some signals (SignalGenerated
), then create a rebalance plan (RebalancePlan
), and finally, execute some trades (TradesExecuted
). Right now, all these modules lean heavily on the Shared
codebase for everything – common data structures, logging, config management, external API integrations (think brokers and price feeds), and internal event semantics.
But here's the kicker: this monolith, while functional, has its limitations. It's less fault-tolerant. Imagine one part of the process failing. The whole thing can grind to a halt. It's also harder to trace and debug issues across the entire flow. We want better fault isolation, more granular state tracking, and easier troubleshooting, and that's where Step Functions come in. They give us per-step state tracking, retry, and failover capabilities. This means we can pinpoint exactly where things go wrong, and recover more gracefully. This is why we're moving towards this new approach.
The Benefits of Going Lambda
The main idea is to decouple core pipeline stages into individual Lambda functions. This way, each function will have a single, well-defined responsibility. This makes the system easier to understand, maintain, and test. We're going to replace the implicit in-memory event chaining with explicit Step Functions sequencing. This provides a clearer view of the workflow and allows us to easily add or modify steps as needed. Each Lambda function will have access to Shared
functionality, either via a layer, a dependency package, or a container base. This way, we don't have to rewrite or duplicate any of the existing code. This also improves fault isolation and traceability because Step Functions provides per-step state tracking, retry, and failover. This enables us to monitor how each step in the pipeline is performing and quickly identify issues if something goes wrong. We also want to simplify things for the team so everyone knows how things function.
Our Game Plan: Building the Lambda-Powered Pipeline
So, here's how we're thinking of setting this up. We're going to create three separate Lambda functions:
lambda_strategy_v2
: This will wrap the existingstrategy_v2.run()
logic. This function will be responsible for generating signals, which are the starting point of our pipeline.lambda_portfolio_v2
: This will wrapportfolio_v2.compare_and_plan()
. This function will focus on creating a rebalance plan based on the signals generated by the first function.lambda_execution_v2
: This will wrapexecution_v2.execute()
. This function takes the rebalance plan and executes the trades.
We'll package Shared
as a common dependency. This can be done in a few ways. We can create a Lambda layer, which is basically a zip file containing the shared code that each Lambda function can access. We could use a vendored package, which means including the shared code directly in each Lambda's deployment package. Or, we could use a container base, which is suitable if we're containerizing our Lambda functions. The most crucial part is making sure each Lambda has access to the shared logic.
Step Functions: The Orchestrator
After we build our Lambda functions, we're going to define a Step Functions state machine. It will look like this:
strategy_v2 → portfolio_v2 → execution_v2
Each stage will be a step in the state machine, and Step Functions will handle the sequencing, data passing, and error handling. We'll add error handling and failure stops at each stage, so the system is more robust. We'll pass data (like signals and rebalance plans) between Lambdas via the Step Functions context. This ensures that the data is handled in a consistent and reliable way.
Digging Deeper: Investigating the Details
Alright, let's get into the nitty-gritty. We've got a few key areas that need further investigation to get this project off the ground:
- Packaging the Shared Code: Figuring out how to deploy the shared code to all Lambdas is important. Should we use a layer or a vendored package? Each method has its pros and cons. Layers are great for code reuse and smaller deployment packages, but can add complexity. Vendored packages are easier to set up, but they might lead to larger deployment packages and more duplication.
- State Payload Size Limits: We need to know the limits on the size of the state payloads, especially when dealing with rebalance plans and trade confirmations. This will help us determine how much data we can pass between steps in the workflow. If we hit the limit, we'll need to think about alternative ways to handle the data, such as storing it in a database and passing references.
- Logging, Metrics, and Trace Correlation: We need to ensure we can easily trace requests across all the steps, from start to finish. This is why we need to correlate our logs, metrics, and traces across the steps. CloudWatch and X-Ray are likely going to be our main tools here. We want to be able to see the full picture of what's happening in each function and identify issues quickly.
- Cold Start and Execution Latency: We'll be measuring cold start times and execution latencies of the new architecture and comparing it to the current monolith to see if there is any impact. Cold starts can be a concern with Lambda functions, so we'll need to minimize them as much as possible.
- Cost Modeling: We'll need to do some cost modeling to understand the costs associated with our new architecture. This includes state transitions, Lambda invocations, and retries. This will help us ensure we're making a cost-effective decision.
The Fine Print: Acceptance Criteria
To make sure we're on the right track, we've set up some clear acceptance criteria:
- Working Proof-of-Concept: We'll need a working proof-of-concept of the Step Functions workflow that can invoke all three stages. This is our basic test that everything is connected and functioning.
- Shared Code Functionality: The shared code needs to load and work in each Lambda function. This verifies that we can reuse the existing code base.
- End-to-End Dry-Run Execution: We'll do at least one dry-run execution with mocked external I/O. This helps us ensure the system works with simulated external interactions.
- Write-up Summary: We need a summary write-up. This will cover the trade-offs compared to the current monolith, the packaging strategy for
Shared
, and the observability and testability gains/losses.
Conclusion: Moving Forward
This is a solid plan to improve our current pipeline. It's about taking a monolithic pipeline and breaking it down into smaller, more manageable pieces. By using Lambda functions and Step Functions, we're aiming for a more resilient, scalable, and observable system. We are going to make it happen! 🚀