Agents 72/73: Per-PR Concurrency Locks Explained

by SLV Team 49 views
Agents 72/73: Per-PR Concurrency Locks Explained

Hey there, fellow coders! Ever found yourself in a situation where multiple workers are trying to tackle the same issue simultaneously? It's like having two chefs in the kitchen, both trying to use the same ingredients at the same time – chaos! This is precisely the problem we're addressing with Agents 72/73: Per-PR Concurrency Locks. We're talking about preventing parallel work on the same IssueDiscussion category to ensure smooth operations. Let's dive deep into why this is important and how we're making it happen. The main goal is to implement per-PR concurrency locks to prevent parallel work on the same IssueDiscussion category.

The Why: Why Do We Need Per-PR Concurrency Locks?

So, why the need for these locks, you ask? Well, the Codex belt worker operates with a global concurrency group. This means that when two runs target the same issue, there's a risk of overlap. These runs could step on each other's toes, leading to all sorts of problems – data corruption, unexpected behavior, and a general mess. Our mission is to ensure one worker-at-a-time per PR/Issue while still allowing parallelism across different PRs. Think of it like this: each pull request (PR) or issue gets its own dedicated worker, preventing conflicts while still allowing multiple PRs to be processed concurrently. This approach will significantly increase stability and reliability in our system.

This system is very important for the codex belt worker which runs with a global concurrency group. Imagine that two runs targeting the same issue can overlap and cause issues. This is why we need one worker-at-a-time per PR/Issue while allowing parallelism across different PRs. Agents 72 and 73 are the main focus of this new change. Changing the concurrency group is the primary goal.

Scope: Agents 72 and Agents 73

This project focuses on Agents 72 (Worker) and Agents 73 (Conveyor). These agents are crucial to the workflow, and ensuring their smooth operation is vital. The changes are specifically targeted to modify the concurrency behavior without altering any other aspects of the system. We're not touching the keepalive cadence, the dispatcher logic, or task sizing. The main goal is to introduce per-PR concurrency locks in these agents.

This is a scoped project meaning we will only focus on Agents 72 and Agents 73. No other components or functionalities are affected by this project. The primary goal is to change the concurrency group. Keep in mind that we're keeping the cancel-in-progress: false setting. This means that if a worker is already running, the new one will queue up.

Tasks: What Needs to Be Done?

So, what does this actually involve? Here's the breakdown of the tasks:

  • Agents 72: We need to change the concurrency.group to incorporate the Issue number. For example, codex-belt-${{ github.event.client_payload.issue || github.event.inputs.issue || github.run_id }}. We're also keeping cancel-in-progress: false. This ensures that if a worker is already running for a specific issue, the new one will queue up instead of canceling the existing one. We will be updating the .github/workflows/agents-72-codex-belt-worker.yml file.
  • Agents 73: The same per-PR group scheme needs to be mirrored for Agent 73. This is essential to ensure that the concurrency locks work consistently across both agents. This is crucial for avoiding any race conditions or conflicts. We will be updating the .github/workflows/agents-73-codex-belt-conveyor.yml file.
  • Summary Row: We'll add a summary row that prints the computed concurrency group and the targeted Issue/branch for auditing. This is super helpful for debugging and understanding what's going on. This will give us a quick and easy way to see which issues are being processed and how the concurrency groups are being managed.
  • Cheap Guard: Finally, we'll add a cheap guard that bails if the computed group is empty (misconfigured dispatch). This prevents any potential errors if the dispatch is not set up correctly. This acts as a safety net to catch any configuration issues before they cause problems.

These tasks are focused on the implementation of per-PR concurrency locks. The main goal is to prevent parallel work on the same IssueDiscussion category.

Acceptance Criteria: How Do We Know It's Working?

How do we know if we've succeeded? Here's the acceptance criteria:

  • Issue-Specific Runs: Two worker dispatches for the same issue should never run at the same time. The second one should queue until the first one completes. This is the core functionality. No matter how many times you trigger a worker for the same issue, only one will run at a time.
  • Parallel Processing: Workers for different issues should be able to run concurrently. This is important to ensure that the system remains efficient and can handle multiple tasks simultaneously.
  • Run Summaries: Run summaries should show the per-PR group string and the targeted Issue number. This is for auditing and easy tracking of the groups and the issues.

If these criteria are met, we'll know we've successfully implemented per-PR concurrency locks. Workers for different Issues can run concurrently. This will ensure no workers running at the same time.

Implementation Notes: Key Details

Let's get into the nitty-gritty of the implementation:

  • File Changes: The primary files we'll be modifying are .github/workflows/agents-72-codex-belt-worker.yml and .github/workflows/agents-73-codex-belt-conveyor.yml. These files contain the workflow definitions for Agents 72 and 73, respectively. All the changes will be localized within these two files.
  • No Behavior Change: No changes to label gates; only concurrency scoping. We're only changing how the concurrency is managed. This is all about concurrency. This means that the existing label gates functionality will remain untouched. We're solely focusing on scoping the concurrency at the PR/Issue level.
  • Permissions and Tokens: Preserve current permissions and tokens. We want to ensure the implementation doesn't affect existing permissions or tokens. This is to maintain the same level of security and access control as before.

In essence, we're making targeted changes to these workflow files to implement per-PR concurrency locks. These changes will not affect existing functionalities and will not involve any modifications to permissions or tokens.

Branch and Pull Request Details

The branch for this work is codex/issue--per-pr-worker-locks. The PR title prefix will be [Agents] Per‑PR worker/conveyor locks. This prefix is used to easily identify the PR and associate it with the specific task. Only the workflow files will be touched during this project, and that includes .github/workflows/agents-72-codex-belt-worker.yml and .github/workflows/agents-73-codex-belt-conveyor.yml.

That's the gist of it, folks! This is a simple but super important change to improve the stability and reliability of our workflow. By implementing per-PR concurrency locks, we're ensuring that our workers operate smoothly and efficiently, preventing those pesky conflicts and data corruption issues. Happy coding!