fix(webapp): retry on version collision when initializing a deployment#3610
fix(webapp): retry on version collision when initializing a deployment#3610d-cs wants to merge 1 commit into
Conversation
Concurrent `POST /api/v1/deployments` requests for the same environment both read the same latest deployment, compute the same next version, and race on the `WorkerDeployment(environmentId, version)` unique constraint. One succeeds, the other crashes with P2002. Extract version assignment + create into `createDeploymentWithNextVersion`, which retries on P2002 (max 5 attempts) with small randomised jitter so N concurrent racers don't loop in lockstep. Logs a `warn` per collision so the retry rate is observable. When retries are exhausted, throws a dedicated `DeploymentVersionCollisionError` (with the original P2002 as `cause` and `environmentId`/`attempts`/`lastAttemptedVersion` for debugging) so contention exhaustion is distinguishable in Sentry from any other unique-constraint violation. The image ref computation stays inside the builder callback so it's recomputed against the freshly-assigned version on every attempt; the persisted `imageReference` always matches the persisted `version`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
WalkthroughThis PR fixes a race condition in worker deployment version assignment by introducing Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/webapp/test/createDeploymentWithNextVersion.test.ts (1)
1-133: ⚡ Quick winMove this test beside the source helper to match repo test layout conventions.
The coverage looks good, but this file should be colocated with
createDeploymentWithNextVersion.server.ts(e.g.,createDeploymentWithNextVersion.test.ts) instead of living underapps/webapp/test.As per coding guidelines: "Place test files next to source files using the pattern
MyService.ts->MyService.test.ts".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/webapp/test/createDeploymentWithNextVersion.test.ts` around lines 1 - 133, The test file should be moved to be colocated with the source helper createDeploymentWithNextVersion.server.ts: create a new file named createDeploymentWithNextVersion.test.ts alongside createDeploymentWithNextVersion.server.ts, copy the existing tests (which reference createDeploymentWithNextVersion and DeploymentVersionCollisionError) into that file, update the import paths so the test imports from "~/v3/services/initializeDeployment/createDeploymentWithNextVersion.server" relative to the new location (or use the existing module alias), and delete the old file under apps/webapp/test; do not change test logic or assertions.apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.ts (1)
16-19: ⚡ Quick winValidate retry options at runtime before using them.
options.maxRetries/options.jitterMsare trusted directly on Line 52-Line 53. Invalid values can silently break retry behavior; parse once with Zod and use the parsed result.Proposed patch
import { isUniqueConstraintError, type Prisma, type PrismaClientOrTransaction, type WorkerDeployment, } from "@trigger.dev/database"; import { setTimeout as sleep } from "node:timers/promises"; +import { z } from "zod"; import { logger } from "~/services/logger.server"; import { calculateNextBuildVersion } from "../../utils/calculateNextBuildVersion"; @@ export type CreateDeploymentWithNextVersionOptions = { maxRetries?: number; jitterMs?: { min: number; max: number }; }; + +const CreateDeploymentWithNextVersionOptionsSchema = z.object({ + maxRetries: z.number().int().min(0).optional(), + jitterMs: z + .object({ + min: z.number().int().min(0), + max: z.number().int().min(0), + }) + .refine((v) => v.max >= v.min, { + message: "jitterMs.max must be greater than or equal to jitterMs.min", + }) + .optional(), +}); @@ export async function createDeploymentWithNextVersion( prisma: PrismaClientOrTransaction, environmentId: string, buildData: (nextVersion: string) => CreateDeploymentData | Promise<CreateDeploymentData>, options: CreateDeploymentWithNextVersionOptions = {} ): Promise<WorkerDeployment> { - const maxRetries = options.maxRetries ?? DEFAULT_MAX_RETRIES; - const jitterMs = options.jitterMs ?? DEFAULT_JITTER_MS; + const parsedOptions = CreateDeploymentWithNextVersionOptionsSchema.parse(options); + const maxRetries = parsedOptions.maxRetries ?? DEFAULT_MAX_RETRIES; + const jitterMs = parsedOptions.jitterMs ?? DEFAULT_JITTER_MS;As per coding guidelines: "Use zod for validation in packages/core and apps/webapp".
Also applies to: 52-53
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.ts` around lines 16 - 19, Create a zod schema for CreateDeploymentWithNextVersionOptions (e.g., maxRetries: z.number().int().min(0).optional().default(3); jitterMs: z.object({ min: z.number().min(0), max: z.number().min(0) }).optional()) and parse the incoming options at the start of createDeploymentWithNextVersion (or wherever options are consumed); replace direct uses of options.maxRetries and options.jitterMs with the validated/parsed values from the zod result to ensure correct types/defaults and prevent invalid retry/jitter behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.ts`:
- Around line 16-19: Create a zod schema for
CreateDeploymentWithNextVersionOptions (e.g., maxRetries:
z.number().int().min(0).optional().default(3); jitterMs: z.object({ min:
z.number().min(0), max: z.number().min(0) }).optional()) and parse the incoming
options at the start of createDeploymentWithNextVersion (or wherever options are
consumed); replace direct uses of options.maxRetries and options.jitterMs with
the validated/parsed values from the zod result to ensure correct types/defaults
and prevent invalid retry/jitter behavior.
In `@apps/webapp/test/createDeploymentWithNextVersion.test.ts`:
- Around line 1-133: The test file should be moved to be colocated with the
source helper createDeploymentWithNextVersion.server.ts: create a new file named
createDeploymentWithNextVersion.test.ts alongside
createDeploymentWithNextVersion.server.ts, copy the existing tests (which
reference createDeploymentWithNextVersion and DeploymentVersionCollisionError)
into that file, update the import paths so the test imports from
"~/v3/services/initializeDeployment/createDeploymentWithNextVersion.server"
relative to the new location (or use the existing module alias), and delete the
old file under apps/webapp/test; do not change test logic or assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f6a959b5-e870-4b67-b147-dea0df3dd9bc
📒 Files selected for processing (4)
.server-changes/fix-worker-deployment-version-race.mdapps/webapp/app/v3/services/initializeDeployment.server.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/test/createDeploymentWithNextVersion.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (14)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use vitest for all tests in the Trigger.dev repository
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: Access environment variables through theenvexport ofenv.server.tsinstead of directly accessingprocess.env
Use subpath exports from@trigger.dev/corepackage instead of importing from the root@trigger.dev/corepathUse named constants for sentinel/placeholder values (e.g.
const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
apps/webapp/**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Do not import
env.server.tsdirectly or indirectly into test files; instead pass environment-dependent values through options/parameters to make code testableFor testable code, never import
env.server.tsin test files. Pass configuration as options instead (e.g.,realtimeClient.server.tstakes config as constructor arg,realtimeClientGlobal.server.tscreates singleton with env config)
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.ts
{apps,internal-packages}/**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pnpm run typecheckto verify changes in apps and internal packages (apps/*,internal-packages/*) instead ofbuild, which proves almost nothing about correctness
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
**/*.test.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.test.{ts,tsx,js}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files using the patternMyService.ts->MyService.test.ts
**/*.test.{ts,tsx,js}: Use vitest for unit testing and run tests withpnpm run test
Test files should live beside the files under test with descriptivedescribeanditblocks
Tests should avoid mocks or stubs and use helpers from@internal/testcontainerswhen Redis or Postgres are needed
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.ts
**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use testcontainers with
redisTest,postgresTest, orcontainerTestfrom@internal/testcontainersfor testing with Redis/PostgreSQL dependencies
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.ts
{package.json,**/*.{ts,tsx,js}}
📄 CodeRabbit inference engine (CLAUDE.md)
Pin Zod to version 3.25.76 exactly across the entire monorepo - never use a different version or version range
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js}: Import from@trigger.dev/coreusing subpaths only, never the root export
Always import tasks from@trigger.dev/sdk, never from@trigger.dev/sdk/v3or deprecatedclient.defineJob
Add crumbs to code using//@Crumbscomments or `// `#region` `@crumbsblocks for debug tracing during development
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}
📄 CodeRabbit inference engine (AGENTS.md)
Code formatting is enforced using Prettier. Run
pnpm run formatbefore committing
Files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
apps/webapp/**/*.server.ts
📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
apps/webapp/**/*.server.ts: Never userequest.signalfor detecting client disconnects. UsegetRequestAbortSignal()fromapp/services/httpAsyncStorage.server.tsinstead, which is wired directly to Expressres.on('close')and fires reliably
Access environment variables viaenvexport fromapp/env.server.ts. Never useprocess.envdirectly
Always usefindFirstinstead offindUniquein Prisma queries.findUniquehas an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance).findFirstis never batched and avoids this entire class of issues
Files:
apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
🧠 Learnings (7)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
📚 Learning: 2026-05-07T12:25:18.271Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3531
File: apps/webapp/test/sentryTraceContext.server.test.ts:9-47
Timestamp: 2026-05-07T12:25:18.271Z
Learning: In the triggerdotdev/trigger.dev webapp test suite, it is acceptable to leave `createInMemoryTracing()` calls that register a global `NodeTracerProvider` without `afterEach`/`afterAll` teardown. Do not flag this as a test-ordering risk when the code follows the established pattern used across webapp tests (e.g., replication service/benchmark/backfiller tests). This is considered safe because `trace.getActiveSpan()` when called outside a `context.with(...)` block reads `AsyncLocalStorage.getStore()` (undefined when no `run()` scope exists), so it falls back to `ROOT_CONTEXT` with no attached span—regardless of which provider is registered.
Applied to files:
apps/webapp/test/createDeploymentWithNextVersion.test.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.
Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.
Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.
Applied to files:
apps/webapp/test/createDeploymentWithNextVersion.test.tsapps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
📚 Learning: 2026-03-10T17:56:20.938Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3201
File: apps/webapp/app/v3/services/setSeatsAddOn.server.ts:25-29
Timestamp: 2026-03-10T17:56:20.938Z
Learning: Do not implement local userId-to-organizationId authorization checks inside org-scoped service classes (e.g., SetSeatsAddOnService, SetBranchesAddOnService) in the web app. Rely on route-layer authentication (requireUserId(request)) and org membership enforcement via the _app.orgs.$organizationSlug layout route. Any userId/organizationId that reaches these services from org-scoped routes has already been validated. Apply this pattern across all org-scoped services to avoid redundant auth checks and maintain consistency.
Applied to files:
apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.
Applied to files:
apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.
Applied to files:
apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.tsapps/webapp/app/v3/services/initializeDeployment.server.ts
🔇 Additional comments (3)
.server-changes/fix-worker-deployment-version-race.md (2)
1-4: LGTM!
6-6: LGTM!apps/webapp/app/v3/services/initializeDeployment.server.ts (1)
18-18: LGTM!Also applies to: 191-261, 311-311
Concurrent
POST /api/v1/deploymentsrequests for the same environment race on theWorkerDeployment(environmentId, version)unique constraint. Both requests read the same latest deployment viafindFirst, compute the same next version viacalculateNextBuildVersion, and both attemptprisma.workerDeployment.create()— one wins, the other crashes with PrismaP2002. The bug is a classic TOCTOU between the version read and the version write; it's been latent since the version-assignment logic was first added but only fires when two deploys land within milliseconds of each other (CI matrices, retried CLI calls, webhook-triggered redeploys).Approach
Extracts the version assignment + create into a small helper
createDeploymentWithNextVersion(apps/webapp/app/v3/services/initializeDeployment/createDeploymentWithNextVersion.server.ts). The helper retries onP2002 (environmentId, version)up to 5 times with randomised 5–50ms jitter so N concurrent racers don't loop in lockstep. Each attempt re-reads the latest version, recomputes viacalculateNextBuildVersion, and re-runs the caller'sbuildDatacallback so version-dependent fields (image ref tag, friendlyId) are always consistent with the version actually persisted. Alogger.warnfires per collision so the retry rate is observable in production logs.When retries are exhausted, the helper throws a dedicated
DeploymentVersionCollisionErrorcarryingenvironmentId,attempts, andlastAttemptedVersion, with the originalPrismaClientKnownRequestErrorattached ascause. Sentry walks thecausechain natively, so contention exhaustion shows up as a distinguishable wrapper exception linked to the underlyingP2002rather than a generic unique-constraint violation that looks identical to every other duplicate-key bug.The behavioural change is limited to "catch P2002 and retry instead of crashing." The image ref computation stays inside the builder callback (same call site as before the refactor), so ECR / non-ECR behaviour, S2 stream creation order, and all downstream side effects are unchanged.
Non-goals
calculateNextBuildVersioncall increateBackgroundWorker.server.ts, which likely has the same race shape againstBackgroundWorker's unique constraint — flagged as a follow-up.Test plan
pnpm run typecheck --filter webapppasses (no new errors in the modified files).apps/webapp/test/createDeploymentWithNextVersion.test.tsviacontainerTest:Set(versions).size === concurrency). The naive read-then-create version of the helper fails this test with the exact sameP2002seen in production; the retry version passes.P2002errors raised from thebuildDatacallback propagate immediately without retry, builder invoked exactly once.maxRetries: 0, concurrent racers surface the wrappedDeploymentVersionCollisionError(not a rawP2002);environmentId,attempts,lastAttemptedVersionare populated anderror.cause.code === "P2002".apps/webapp/test/getDeploymentImageRef.test.tsstill green (the file was untouched in the final diff).Follow-ups (not in this PR)
createBackgroundWorker.server.tslikely has the same TOCTOU shape against its background-worker version unique constraint — should use the same helper.error.causechain renders as a linked exception in the Sentry UI when the wrapped error fires (requires a sandboxed triggering of the exhaustion path).