Vision Lab
The data pipeline behind a card recognition model — collecting, curating, and labelling training images at scale.
Card recognition — identifying a Pokémon card from a photo — is a genuinely hard computer vision problem. Cards share visual structure (same border style, same layout), differ in subtle ways (artwork, set symbol, card number, holographic pattern), and come in hundreds of variant printings that a model needs to distinguish reliably.
Training a model that performs well requires a large, clean, well-labelled dataset. Assembling that dataset — collecting images, verifying they map to the correct card and variant, normalising quality, and structuring labels in a format the training pipeline can consume — is its own substantial engineering problem.
Vision Lab is the tooling that makes that dataset assembly tractable. It's not the model itself — it's the infrastructure for building and maintaining the training data that the model learns from.
The core challenge is that training data quality matters more than quantity. A dataset of 100,000 poorly labelled or inconsistently cropped images produces a worse model than 20,000 carefully curated ones. Vision Lab is designed around that constraint — making it practical to collect images at volume while applying enough structure and verification to keep quality high.
We treat the dataset as a product: versioned, auditable, with clear provenance for every image. When the model performs badly on a specific card type, we can trace back to the training data for those cards, identify the issue, and fix it — rather than treating the dataset as an opaque blob that occasionally gets added to.
Vision Lab has two main surfaces: an image collection and labelling interface, and a dataset management layer that produces the structured exports the training pipeline consumes.
Variant-level label precision
Pokémon cards have many variants: base, reverse holofoil, full art, alternate art, Poké Ball pattern, Master Ball pattern. A model that can identify 'Charizard' but not 'Charizard Poké Ball pattern reverse holofoil' is only partially useful. Getting labels to variant granularity required that the labelling interface understand the card data model well enough to present the right options — which means it's tightly coupled to the TCGDex dataset structure.
Defining 'good enough' image quality
Not every image in a training set needs to be perfect — some variation in lighting, angle, and quality is actually useful for making the model robust. But there are thresholds: too much glare makes holographic patterns unreadable, too much blur makes text illegible, too much crop removes identifying features. Defining these thresholds in a way that reviewers could apply consistently, and encoding them as structured metadata rather than a binary pass/fail, took iteration.
Coverage gaps and class imbalance
Common cards from recent sets are easy to collect images for. Older cards, regional variants, and lower-print-run promos are hard to find in sufficient quantity. A model trained on an imbalanced dataset will perform well on common cards and poorly on rare ones — which is the opposite of what's useful. The coverage dashboard exists specifically to surface these gaps so we can prioritise targeted collection rather than just adding more images of cards that are already well represented.
Keeping the dataset consistent as cards change
TCGDex occasionally corrects card data — a card number that was wrong, a variant relationship that was mismodelled. When that happens, training images labelled against the old data need to be updated. Building the dataset with card IDs as the primary key (rather than human-readable names or numbers) means these corrections can be propagated systematically rather than requiring manual re-labelling.
Vision Lab is internal tooling. The card recognition model it trains is an ongoing project — the dataset and tooling are the current focus.