Data annotation

Annotation for high-risk AI, built in the EEA.

Image, video, audio, text, LiDAR 3D, and RLHF preference labelling by identity-verified European contributors. 100% human QA with consensus scoring. EU AI Act Article 10 documentation shipped with every project.

  • EEA-only
  • GDPR Art. 7 + 9
  • EU AI Act Art. 10(2)(f)
  • 30-day erasure

Norwegian legal entity. Named project lead replies within one EU business day.

Annotation_pipeline YPAI-DA-2026.05
  • Image
    bbox, semantic + instance segmentation, keypoint, polygon Live
  • Video
    frame-by-frame, object tracking, scene segmentation, pose Live
  • Audio
    transcription, diarization, phoneme, emotion, event Live
  • Text
    NER, sentiment, intent, entity linking, parallel corpus Live
  • LiDAR 3D
    point-cloud, semantic segmentation, sensor-fusion Live
  • RLHF
    preference, ranking, rewriting, red-team, instruction Live
JURISDICTION
EEA
CONSENT
Per-contributor
QA
100% human
LANGUAGES
150+

Technique coverage

Six modalities, one annotation contract.

Multi-vendor data programs accumulate per-modality DPAs, per-team QA standards, and per-tool provenance schemas. YPAI runs all six modalities under one master DPA, one consensus QA standard, and one audit-artefact bundle.

The matrix below maps technique to QA depth to output format per modality. Vertical anchors deep-link to the industry surface where the modality is most commonly procured.

  • Image
    Techniques
    bbox, semantic segmentation, instance segmentation, keypoint, polygon
    QA depth
    3-of-3 consensus + expert escalation
    Output format
    COCO JSON, Pascal VOC XML, custom
  • Video
    Techniques
    frame-by-frame labelling, object tracking, scene segmentation, 3D motion / pose
    QA depth
    Frame-level consensus + temporal review
    Output format
    MOT challenge, custom XML, KITTI-format
  • Audio
    Techniques
    phoneme segmentation, speaker diarization + attribution, emotion labelling, audio event detection
    QA depth
    Listener consensus + native-speaker review
    Output format
    TextGrid, RTTM, custom JSON
  • Text
    Techniques
    NER, sentiment (aspect-based), intent classification, entity linking, parallel corpus alignment
    QA depth
    Multi-rater + linguist review
    Output format
    CoNLL, JSONL, custom
  • LiDAR 3D
    Techniques
    point-cloud annotation, LiDAR semantic segmentation, sensor-fusion (LiDAR + camera)
    QA depth
    Frame consensus + sensor-fusion QA
    Output format
    KITTI bin, nuScenes JSON, custom
  • RLHF preference
    Techniques
    response rating, preference comparison, response rewriting, red-teaming, fact-checking, instruction tuning
    QA depth
    Multi-rater Krippendorff alpha + lead review
    Output format
    OpenAI/Anthropic preference JSON, custom

QA methodology

Five-stage QA, every modality.

Marketed quality is a metric. YPAI quality is a pipeline. Every asset passes through identity-verified annotation, multi-annotator consensus, inter-annotator agreement scoring, expert escalation on disagreement, and final lead review before delivery.

The stages below are the same across all six modalities. The metric that emerges per project is reportable and audit-defensible.

  1. 01

    Identity-verified annotation

    Annotator on the network with documented identity and EEA residency picks up the task in self-hosted CVAT, Label Studio, or proprietary tooling.

    Annotator credential ID logged per asset

  2. 02

    Multi-annotator pass

    Each asset is independently labelled by 2 to 3 annotators with no visibility into peer output.

    Per-annotator label record

  3. 03

    Consensus scoring

    Labels are compared. Agreement above threshold passes; disagreement routes to stage 4.

    Inter-annotator agreement score (Cohen kappa or Krippendorff alpha)

  4. 04

    Expert escalation

    Disagreed assets escalate to a domain specialist (clinical, financial, legal, or modality-lead) for adjudication.

    Escalation rate plus adjudication outcome per asset

  5. 05

    Lead review and delivery sign-off

    Named project lead samples the delivered batch, signs off, and the audit-artefact bundle is generated.

    Sampling rate plus sign-off log

Regulatory alignment

Article 10 bias mitigation, mapped to artefacts.

Article 10 of the EU AI Act sets data-governance obligations for high-risk AI providers, effective 2026-08-02. YPAI does not certify your AI system. YPAI ships the evidence pack your conformity assessor and DPO need to argue the system is compliant.

Below: the specific 10(2) and 10(3) requirements mapped to the deliverable that addresses them.

  • Art. 10(2)(f)

    Examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination.

    Bias examination report per project

    Per-batch demographic and dialect distribution metadata, sampling-methodology disclosure, and identified-bias-vector log. Generated from contributor-network metadata; shipped as bias_assessment.pdf in the evidence pack.

  • Art. 10(2)(g)

    Appropriate measures to detect, prevent, and mitigate possible biases identified.

    Mitigation actions log

    Per identified bias vector, the resampling, rebalancing, or escalation action taken, with annotator-cohort change documented. Linked to the consent records so mitigation is auditable from cohort to consent.

  • Art. 10(3)

    Training, validation, and testing data sets shall be relevant, representative, and to the best extent possible, free of errors and complete.

    Representativeness attestation

    Per-language, per-demographic, per-vertical coverage matrix with explicit gaps named. We do not claim coverage we cannot evidence. Gaps become scope-discussion in T+3 days, not hidden in the data.

  • Art. 10(2)(b, c)

    Data collection processes; data preparation, including annotation, labelling, cleaning, updating, enrichment, and aggregation.

    Annotation provenance log

    Per-asset record of which annotators touched the asset, at which stages, with which credentials. Generated from the QA pipeline (section 3) and shipped as data_provenance.pdf.

Master DPA template and full audit-artefact specifications

Workforce + tooling

Identity-verified contributors, on the tools you use.

Not a marketplace crowd. A vetted European network with individual-level provenance, documented residency, and credential records for regulated-domain work. Self-hosted CVAT and Label Studio for tool flexibility, plus proprietary tooling for modalities where open-source tools fall short.

40,000+
Contributor network

Identity-verified, EEA-resident

150+
Languages covered

All Nordic languages production-confirmed

50+
Countries with active contributors

All EEA-resident; non-EEA only via DPA addendum

100%
Human QA coverage

No silent auto-QA; every asset reviewed

Annotation tooling

  • CVAT (self-hosted)
  • Label Studio (self-hosted)
  • Proprietary annotation tooling
  • Customer-tool bridge

We adapt to your tooling stack when it makes engineering sense. We do not require you to migrate to ours.

Domain specialists

Clinical, legal, financial, multilingual credentials

Where the modality, the regulatory context, or the asset risk-class demands credentialed reviewers, YPAI brings them in. Below: the credential categories available at scoping. Specific reviewer credentials are confirmed in the project SOW.

Clinical
Radiology, cardiology, pathology: licensed practitioners; HIPAA-aligned workflow available
Legal
EU regulation, case-law citation, contract review: bar-admitted counsel for adjudication tasks
Financial
MiFID II, KYC/AML, accounting taxonomies: finance domain reviewers
Multilingual + dialect
Native-speaker reviewers for all Nordic, Slavic, and minority European languages; dialect-level granularity at scoping

Credentialed-reviewer hourly cost is a line-item, not a hidden uplift. Quoted at scoping.

Deliverable evidence pack

Every annotation project ships an audit-ready bundle.

On delivery you receive the annotated dataset PLUS a structured evidence bundle your DPO, legal counsel, conformity assessor, and procurement team can review without follow-up. Master DPA included by default, not on request.

bundle.tree YPAI-DA / per project
ypai-annotation-bundle/
|-- README.md
|-- dataset/
|   |-- annotations/                  # per-modality output files
|   |-- manifest.csv                  # per-asset provenance index
|   `-- checksums.sha256
|-- qa/
|   |-- inter_annotator_agreement.csv # IAA score per batch (Cohen/Krippendorff)
|   |-- consensus_log.csv             # stage-3 disagreement + adjudication trace
|   |-- expert_escalation_log.csv     # stage-4 escalations + outcomes
|   `-- sampling_methodology.pdf
|-- compliance/
|   |-- data_provenance.pdf           # Article 10(2)(b, c): annotation pipeline log
|   |-- bias_assessment.pdf           # Article 10(2)(f): bias examination report
|   |-- representativeness.pdf        # Article 10(3): coverage matrix + gaps
|   |-- consent_audit.csv             # GDPR Article 7 per-contributor consent records
|   `-- residency_attestation.pdf    # EEA processing + sub-processor disclosure
|-- credentials/                      # only present if regulated-domain reviewers used
|   `-- reviewer_credentials.pdf
`-- contract/
    |-- master_dpa.pdf                # Article 28 pre-cleared
    `-- sccs_if_applicable.pdf        # only if customer-directed extra-EEA transfer
  • inter_annotator_agreement.csv

    Cohen kappa or Krippendorff alpha per batch. The metric your conformity assessor will reference, surfaced before they ask.

  • bias_assessment.pdf

    Article 10(2)(f) deliverable: demographic, dialect, and vertical-coverage distribution, identified bias vectors, and mitigation actions taken.

  • consent_audit.csv

    Per-contributor GDPR Article 7 consent record. Not platform-ToS aggregate consent: specific-purpose, per-project, withdrawable.

  • data_provenance.pdf

    Per-asset annotator-touch log: which credential touched which asset at which QA stage. Article 10(2)(b, c) ready.

  • residency_attestation.pdf

    EEA processing attestation. Sub-processor list named. No CLOUD Act exposure by default.

  • master_dpa.pdf

    Article 28 GDPR DPA pre-cleared. Customer-specific addendums (residency, sub-processor) accepted; the baseline DPA ships by default, not on request.

Master DPA template and audit-artefact specifications

Next step

Scope an annotation project.

Tell us the modality, the regulatory context, and the volume. We map a delivery plan with the QA pipeline, evidence pack, and master DPA included by default.

EU AI Act Article 10 applies from 2026-08-02. Cumulative GDPR fines have passed EUR 7.1B. Getting annotation provenance wrong is procurement-blocking. Getting it right is a conversation.

Scope an annotation project

Master DPA included with every YPAI engagement, not on request. Norwegian legal entity. Named project lead replies within one EU business day.

Data annotation intake

Scope an annotation project.

Bring modality, regulatory context, volume estimate, and any tooling-stack constraints. A named EU-resident project lead replies within one EU business day with a feasibility read and Article 10 risk classification.

  • GDPR Article 7 consent records on every asset
  • EEA-only operations, Norwegian Aksjeselskap
  • Identity-verified contributors only: no marketplace crowd
  • 30-day erasure SLA on contributor withdrawal

GDPR Article 7 . GDPR Article 9 . EU AI Act Article 10(2)(f)

What happens next

From submit to scoped pilot in seven days.

Three states this serves: you have submitted and want to know the timing, you are about to submit and have a procurement objection, or you are not ready to submit and want a route deeper into the work.

After you submit

  1. Project lead reads your brief

    T+1 day

    A named EU-resident project lead replies within one EU business day with feasibility, scope clarifications, and a first read on Article 10 risk classification.

  2. Sample evidence pack returned

    T+3 days

    Anonymised sample evidence pack (data_provenance.pdf, bias_assessment.pdf, consent_audit.csv, residency_attestation.pdf). Scoping call agenda agreed.

  3. Free pilot delivered

    T+5 to 7 days

    Free pilot covers annotation AND a QA artefact: 2 languages or 1 modality with 500 assets through the full QA pipeline, with an inter-annotator agreement report. Production engagement scopes from there by modality, volume, and regulatory context.

  4. Master DPA signed, production scope locked

    T+14 days

    Article 28 clauses pre-cleared, EEA-resident processing committed in contract. Sub-processor list named, withdrawal SLA confirmed. Production annotation starts flowing.

Procurement FAQ

  • Is there a minimum project size?

    No hard floor. A typical paid engagement starts around 5k to 50k assets per modality, or 1k to 10k RLHF preference comparisons. The free pilot is fixed at 500 assets through the full QA pipeline with IAA reporting.

  • Can I see a sample evidence pack before signing?

    Yes. Anonymised sample evidence pack (data_provenance, bias_assessment, consent_audit, residency_attestation, inter_annotator_agreement) is sent on procurement request during the T+3-day scoping window.

  • Does the DPA require negotiation?

    No. Article 28 clauses are pre-cleared and included with every contract. Customer-specific addendums on residency, sub-processor scope, or sector-specific terms (DORA, MiFID II, HIPAA-aligned) are accepted, but the standard DPA ships by default.

  • What about US-based sub-processors or US cloud platforms?

    None by default. EEA-only operations on self-hosted CVAT and Label Studio with a named sub-processor list confirmed at scoping. Any US-domiciled sub-processor requires explicit customer sign-off via DPA addendum.

  • Do you support our annotation tool?

    Self-hosted CVAT, self-hosted Label Studio, and proprietary YPAI tooling are the defaults. Customer-tool bridge (API or export-format adaptation) is in scope when it makes engineering sense. We do not require you to migrate to our tooling.

  • What is the withdrawal and erasure SLA?

    30-day erasure SLA on any contributor withdrawal under GDPR Article 7. Annotated assets touched by the withdrawn contributor are flagged and re-routed; the audit log is retained for compliance traceability.