Set Ingestion Backend
The pipeline that turns raw set data into a live, priced, searchable catalogue entry — reliably, every release.
Pokémon TCG sets release roughly every three months. Each new set means hundreds of cards, each with multiple variants, need to be ingested into the platform — structured, normalised, mapped to their Cardmarket and TCGPlayer product IDs, and priced — before they can appear in search and be added to collections.
Done manually, this is a significant operational task. Done with an ad-hoc script per release, it accumulates inconsistency. The set ingestion backend makes it a defined, repeatable pipeline: structured input in, verified platform records out, with the right checks at each stage to catch problems before they reach users.
The pipeline also handles updates — corrections to card data, variant mapping fixes, price re-pulls — not just initial ingestion. It needs to be safe to re-run on a set that's already live, applying updates without disrupting what's already there.
The pipeline is structured as a sequence of stages, each with a clear input contract and a defined output. A failure at any stage stops the pipeline there — no partial ingestion that leaves the database in an inconsistent state.
We separated the pipeline into two modes: automated ingestion for straightforward cases (where the source data is clean and the variant mapping is unambiguous) and supervised ingestion for cases that need a maintainer to review and confirm before proceeding. The admin dashboard's ingestion review queue is the interface for the supervised path.
The pipeline covers the full journey from raw set data to live platform record. Each stage is independently testable and has structured logging so failures are diagnosable.
Variant resolution at scale
The hardest part of ingesting a new set is correctly assigning Cardmarket and TCGPlayer product IDs to each card variant. The extraction tooling produces JSON keyed by canonical card ID, but the canonical IDs need to match exactly what the pipeline creates during card record creation. Any mismatch means a variant ends up with no price source. Getting the canonical ID generation consistent across the ingestion pipeline and the extraction tools required careful alignment — they're separate codebases that need to agree on the same normalisation rules.
Idempotency with mutable data
Re-running ingestion on a set that's already live needs to apply corrections without creating duplicates or overwriting fields that were manually corrected in the admin dashboard. This means the pipeline needs to distinguish between 'this field came from ingestion and should be updated' and 'this field was manually corrected and should be preserved'. Implementing that distinction — essentially a per-field provenance system — was more complex than it sounds when you're dealing with hundreds of fields across hundreds of cards.
The supervised mode interface
When the pipeline pauses for review, the reviewer needs to see exactly what the pipeline found and what it's asking them to decide. The first version of the review queue showed the raw validation output — a list of issues in pipeline log format. That was accurate but required significant domain knowledge to interpret. The admin dashboard's ingestion review view replaced this with a structured display of exactly the cards and variants that need attention, with the specific issue clearly labelled and the available actions obvious.
Downstream coordination
A successful ingestion needs to trigger several downstream jobs: price pulls, search index updates, cache invalidation. These jobs shouldn't run if ingestion failed or is paused for review. Getting the trigger logic right — fire and forget is wrong, but synchronous waiting is also wrong for jobs that take time — required a simple job queue with status tracking, rather than chained pipeline steps.
This is internal tooling — not publicly accessible. It's included here because the problems it solves are representative of the kind of engineering that makes a data-heavy platform reliable at scale.