Data annotation

Annotation for high-risk AI, built in the EEA.

Image, video, audio, text, LiDAR 3D, and RLHF preference labelling by identity-verified European contributors. 100% human QA with consensus scoring. EU AI Act Article 10 documentation shipped with every project.

EEA-only
GDPR Art. 7 + 9
EU AI Act Art. 10(2)(f)
30-day erasure

Scope an annotation project Preview the evidence pack

Norwegian legal entity. Named project lead replies within one EU business day.

Annotation_pipeline YPAI-DA-2026.05

Image
bbox, semantic + instance segmentation, keypoint, polygon Live
Video
frame-by-frame, object tracking, scene segmentation, pose Live
Audio
transcription, diarization, phoneme, emotion, event Live
Text
NER, sentiment, intent, entity linking, parallel corpus Live
LiDAR 3D
point-cloud, semantic segmentation, sensor-fusion Live
RLHF
preference, ranking, rewriting, red-team, instruction Live

JURISDICTION: EEA
CONSENT: Per-contributor
QA: 100% human
LANGUAGES: 150+

Technique coverage

Six modalities, one annotation contract.

Multi-vendor data programs accumulate per-modality DPAs, per-team QA standards, and per-tool provenance schemas. YPAI runs all six modalities under one master DPA, one consensus QA standard, and one audit-artefact bundle.

The matrix below maps technique to QA depth to output format per modality. Vertical anchors deep-link to the industry surface where the modality is most commonly procured.

Modality	Techniques offered	QA depth	Output format	Vertical anchors
Image	bbox, semantic segmentation, instance segmentation, keypoint, polygon	3-of-3 consensus + expert escalation	COCO JSON, Pascal VOC XML, custom	Automotive Healthcare
Video	frame-by-frame labelling, object tracking, scene segmentation, 3D motion / pose	Frame-level consensus + temporal review	MOT challenge, custom XML, KITTI-format	Automotive
Audio	phoneme segmentation, speaker diarization + attribution, emotion labelling, audio event detection	Listener consensus + native-speaker review	TextGrid, RTTM, custom JSON	Automotive Audio collection
Text	NER, sentiment (aspect-based), intent classification, entity linking, parallel corpus alignment	Multi-rater + linguist review	CoNLL, JSONL, custom	Finserv Healthcare
LiDAR 3D	point-cloud annotation, LiDAR semantic segmentation, sensor-fusion (LiDAR + camera)	Frame consensus + sensor-fusion QA	KITTI bin, nuScenes JSON, custom	Automotive
RLHF preference	response rating, preference comparison, response rewriting, red-teaming, fact-checking, instruction tuning	Multi-rater Krippendorff alpha + lead review	OpenAI/Anthropic preference JSON, custom	LLM evaluation

Image

Techniques

bbox, semantic segmentation, instance segmentation, keypoint, polygon

QA depth

3-of-3 consensus + expert escalation

Output format

COCO JSON, Pascal VOC XML, custom

Automotive Healthcare
Video

Techniques

frame-by-frame labelling, object tracking, scene segmentation, 3D motion / pose

QA depth

Frame-level consensus + temporal review

Output format

MOT challenge, custom XML, KITTI-format

Automotive
Audio

Techniques

phoneme segmentation, speaker diarization + attribution, emotion labelling, audio event detection

QA depth

Listener consensus + native-speaker review

Output format

TextGrid, RTTM, custom JSON

Automotive Audio collection
Text

Techniques

NER, sentiment (aspect-based), intent classification, entity linking, parallel corpus alignment

QA depth

Multi-rater + linguist review

Output format

CoNLL, JSONL, custom

Finserv Healthcare
LiDAR 3D

Techniques

point-cloud annotation, LiDAR semantic segmentation, sensor-fusion (LiDAR + camera)

QA depth

Frame consensus + sensor-fusion QA

Output format

KITTI bin, nuScenes JSON, custom

Automotive
RLHF preference

Techniques

response rating, preference comparison, response rewriting, red-teaming, fact-checking, instruction tuning

QA depth

Multi-rater Krippendorff alpha + lead review

Output format

OpenAI/Anthropic preference JSON, custom

LLM evaluation

QA methodology

Five-stage QA, every modality.

Marketed quality is a metric. YPAI quality is a pipeline. Every asset passes through identity-verified annotation, multi-annotator consensus, inter-annotator agreement scoring, expert escalation on disagreement, and final lead review before delivery.

The stages below are the same across all six modalities. The metric that emerges per project is reportable and audit-defensible.

01

Identity-verified annotation

Annotator on the network with documented identity and EEA residency picks up the task in self-hosted CVAT, Label Studio, or proprietary tooling.

Annotator credential ID logged per asset
02

Multi-annotator pass

Each asset is independently labelled by 2 to 3 annotators with no visibility into peer output.

Per-annotator label record
03

Consensus scoring

Labels are compared. Agreement above threshold passes; disagreement routes to stage 4.

Inter-annotator agreement score (Cohen kappa or Krippendorff alpha)
04

Expert escalation

Disagreed assets escalate to a domain specialist (clinical, financial, legal, or modality-lead) for adjudication.

Escalation rate plus adjudication outcome per asset
05

Lead review and delivery sign-off

Named project lead samples the delivered batch, signs off, and the audit-artefact bundle is generated.

Sampling rate plus sign-off log

01

Identity-verified annotation

Annotator on the network with documented identity and EEA residency picks up the task in self-hosted CVAT, Label Studio, or proprietary tooling.

Annotator credential ID logged per asset
02

Multi-annotator pass

Each asset is independently labelled by 2 to 3 annotators with no visibility into peer output.

Per-annotator label record
03

Consensus scoring

Labels are compared. Agreement above threshold passes; disagreement routes to stage 4.

Inter-annotator agreement score (Cohen kappa or Krippendorff alpha)
04

Expert escalation

Disagreed assets escalate to a domain specialist (clinical, financial, legal, or modality-lead) for adjudication.

Escalation rate plus adjudication outcome per asset
05

Lead review and delivery sign-off

Named project lead samples the delivered batch, signs off, and the audit-artefact bundle is generated.

Sampling rate plus sign-off log

Regulatory alignment

Article 10 bias mitigation, mapped to artefacts.

Article 10 of the EU AI Act sets data-governance obligations for high-risk AI providers, effective 2026-08-02. YPAI does not certify your AI system. YPAI ships the evidence pack your conformity assessor and DPO need to argue the system is compliant.

Below: the specific 10(2) and 10(3) requirements mapped to the deliverable that addresses them.

Art. 10(2)(f)

Examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination.

Bias examination report per project

Per-batch demographic and dialect distribution metadata, sampling-methodology disclosure, and identified-bias-vector log. Generated from contributor-network metadata; shipped as bias_assessment.pdf in the evidence pack.
Art. 10(2)(g)

Appropriate measures to detect, prevent, and mitigate possible biases identified.

Mitigation actions log

Per identified bias vector, the resampling, rebalancing, or escalation action taken, with annotator-cohort change documented. Linked to the consent records so mitigation is auditable from cohort to consent.
Art. 10(3)

Training, validation, and testing data sets shall be relevant, representative, and to the best extent possible, free of errors and complete.

Representativeness attestation

Per-language, per-demographic, per-vertical coverage matrix with explicit gaps named. We do not claim coverage we cannot evidence. Gaps become scope-discussion in T+3 days, not hidden in the data.
Art. 10(2)(b, c)

Data collection processes; data preparation, including annotation, labelling, cleaning, updating, enrichment, and aggregation.

Annotation provenance log

Per-asset record of which annotators touched the asset, at which stages, with which credentials. Generated from the QA pipeline (section 3) and shipped as data_provenance.pdf.

Master DPA template and full audit-artefact specifications

Workforce + tooling

Identity-verified contributors, on the tools you use.

Not a marketplace crowd. A vetted European network with individual-level provenance, documented residency, and credential records for regulated-domain work. Self-hosted CVAT and Label Studio for tool flexibility, plus proprietary tooling for modalities where open-source tools fall short.

40,000+: Contributor network
150+: Languages covered
50+: Countries with active contributors
100%: Human QA coverage

Annotation tooling

CVAT (self-hosted)
Label Studio (self-hosted)
Proprietary annotation tooling
Customer-tool bridge

We adapt to your tooling stack when it makes engineering sense. We do not require you to migrate to ours.

Domain specialists

Clinical, legal, financial, multilingual credentials

Where the modality, the regulatory context, or the asset risk-class demands credentialed reviewers, YPAI brings them in. Below: the credential categories available at scoping. Specific reviewer credentials are confirmed in the project SOW.

Clinical: Radiology, cardiology, pathology: licensed practitioners; HIPAA-aligned workflow available
Legal: EU regulation, case-law citation, contract review: bar-admitted counsel for adjudication tasks
Financial: MiFID II, KYC/AML, accounting taxonomies: finance domain reviewers
Multilingual + dialect: Native-speaker reviewers for all Nordic, Slavic, and minority European languages; dialect-level granularity at scoping

Credentialed-reviewer hourly cost is a line-item, not a hidden uplift. Quoted at scoping.

Deliverable evidence pack

Every annotation project ships an audit-ready bundle.

On delivery you receive the annotated dataset PLUS a structured evidence bundle your DPO, legal counsel, conformity assessor, and procurement team can review without follow-up. Master DPA included by default, not on request.

bundle.tree YPAI-DA / per project

ypai-annotation-bundle/
|-- README.md
|-- dataset/
|   |-- annotations/                  # per-modality output files
|   |-- manifest.csv                  # per-asset provenance index
|   `-- checksums.sha256
|-- qa/
|   |-- inter_annotator_agreement.csv # IAA score per batch (Cohen/Krippendorff)
|   |-- consensus_log.csv             # stage-3 disagreement + adjudication trace
|   |-- expert_escalation_log.csv     # stage-4 escalations + outcomes
|   `-- sampling_methodology.pdf
|-- compliance/
|   |-- data_provenance.pdf           # Article 10(2)(b, c): annotation pipeline log
|   |-- bias_assessment.pdf           # Article 10(2)(f): bias examination report
|   |-- representativeness.pdf        # Article 10(3): coverage matrix + gaps
|   |-- consent_audit.csv             # GDPR Article 7 per-contributor consent records
|   `-- residency_attestation.pdf    # EEA processing + sub-processor disclosure
|-- credentials/                      # only present if regulated-domain reviewers used
|   `-- reviewer_credentials.pdf
`-- contract/
    |-- master_dpa.pdf                # Article 28 pre-cleared
    `-- sccs_if_applicable.pdf        # only if customer-directed extra-EEA transfer

inter_annotator_agreement.csv

Cohen kappa or Krippendorff alpha per batch. The metric your conformity assessor will reference, surfaced before they ask.
bias_assessment.pdf

Article 10(2)(f) deliverable: demographic, dialect, and vertical-coverage distribution, identified bias vectors, and mitigation actions taken.
consent_audit.csv

Per-contributor GDPR Article 7 consent record. Not platform-ToS aggregate consent: specific-purpose, per-project, withdrawable.
data_provenance.pdf

Per-asset annotator-touch log: which credential touched which asset at which QA stage. Article 10(2)(b, c) ready.
residency_attestation.pdf

EEA processing attestation. Sub-processor list named. No CLOUD Act exposure by default.
master_dpa.pdf

Article 28 GDPR DPA pre-cleared. Customer-specific addendums (residency, sub-processor) accepted; the baseline DPA ships by default, not on request.

Master DPA template and audit-artefact specifications

Next step

Scope an annotation project.

Tell us the modality, the regulatory context, and the volume. We map a delivery plan with the QA pipeline, evidence pack, and master DPA included by default.

EU AI Act Article 10 applies from 2026-08-02. Cumulative GDPR fines have passed EUR 7.1B. Getting annotation provenance wrong is procurement-blocking. Getting it right is a conversation.

Scope an annotation project

Master DPA included with every YPAI engagement, not on request. Norwegian legal entity. Named project lead replies within one EU business day.

Data annotation intake

Scope an annotation project.

Bring modality, regulatory context, volume estimate, and any tooling-stack constraints. A named EU-resident project lead replies within one EU business day with a feasibility read and Article 10 risk classification.

GDPR Article 7 consent records on every asset
EEA-only operations, Norwegian Aksjeselskap
Identity-verified contributors only: no marketplace crowd
30-day erasure SLA on contributor withdrawal

Name *

Work email *

Company / Organization *

Role

Timeline

Regulatory context

Project brief *

Other modality

I have read the Privacy Policy and consent to YPAI processing my data to provide a technical assessment.

Modalities (optional)

GDPR Article 7 . GDPR Article 9 . EU AI Act Article 10(2)(f)

What happens next

From submit to scoped pilot in seven days.

Three states this serves: you have submitted and want to know the timing, you are about to submit and have a procurement objection, or you are not ready to submit and want a route deeper into the work.

After you submit

Project lead reads your brief
T+1 day

A named EU-resident project lead replies within one EU business day with feasibility, scope clarifications, and a first read on Article 10 risk classification.
Sample evidence pack returned
T+3 days

Anonymised sample evidence pack (data_provenance.pdf, bias_assessment.pdf, consent_audit.csv, residency_attestation.pdf). Scoping call agenda agreed.
Free pilot delivered
T+5 to 7 days

Free pilot covers annotation AND a QA artefact: 2 languages or 1 modality with 500 assets through the full QA pipeline, with an inter-annotator agreement report. Production engagement scopes from there by modality, volume, and regulatory context.
Master DPA signed, production scope locked
T+14 days

Article 28 clauses pre-cleared, EEA-resident processing committed in contract. Sub-processor list named, withdrawal SLA confirmed. Production annotation starts flowing.

Procurement FAQ

Is there a minimum project size?

No hard floor. A typical paid engagement starts around 5k to 50k assets per modality, or 1k to 10k RLHF preference comparisons. The free pilot is fixed at 500 assets through the full QA pipeline with IAA reporting.
Can I see a sample evidence pack before signing?

Yes. Anonymised sample evidence pack (data_provenance, bias_assessment, consent_audit, residency_attestation, inter_annotator_agreement) is sent on procurement request during the T+3-day scoping window.
Does the DPA require negotiation?

No. Article 28 clauses are pre-cleared and included with every contract. Customer-specific addendums on residency, sub-processor scope, or sector-specific terms (DORA, MiFID II, HIPAA-aligned) are accepted, but the standard DPA ships by default.
What about US-based sub-processors or US cloud platforms?

None by default. EEA-only operations on self-hosted CVAT and Label Studio with a named sub-processor list confirmed at scoping. Any US-domiciled sub-processor requires explicit customer sign-off via DPA addendum.
Do you support our annotation tool?

Self-hosted CVAT, self-hosted Label Studio, and proprietary YPAI tooling are the defaults. Customer-tool bridge (API or export-format adaptation) is in scope when it makes engineering sense. We do not require you to migrate to our tooling.
What is the withdrawal and erasure SLA?

30-day erasure SLA on any contributor withdrawal under GDPR Article 7. Annotated assets touched by the withdrawn contributor are flagged and re-routed; the audit log is retained for compliance traceability.

Or explore deeper

Annotation for high-risk AI, built in the EEA.

Six modalities, one annotation contract.

Five-stage QA, every modality.

Identity-verified annotation

Multi-annotator pass

Consensus scoring

Expert escalation

Lead review and delivery sign-off

Identity-verified annotation

Multi-annotator pass

Consensus scoring

Expert escalation

Lead review and delivery sign-off

Article 10 bias mitigation, mapped to artefacts.

Bias examination report per project

Mitigation actions log

Representativeness attestation

Annotation provenance log

Identity-verified contributors, on the tools you use.

Every annotation project ships an audit-ready bundle.

Scope an annotation project.

Scope an annotation project.

Brief received.

From submit to scoped pilot in seven days.

Project lead reads your brief

Sample evidence pack returned

Free pilot delivered

Master DPA signed, production scope locked

Data Collection

Data Validation

Ethical Framework

Vertical Solutions