Data annotation
Annotation for high-risk AI, built in the EEA.
Image, video, audio, text, LiDAR 3D, and RLHF preference labelling by identity-verified European contributors. 100% human QA with consensus scoring. EU AI Act Article 10 documentation shipped with every project.
- EEA-only
- GDPR Art. 7 + 9
- EU AI Act Art. 10(2)(f)
- 30-day erasure
Norwegian legal entity. Named project lead replies within one EU business day.
- Imagebbox, semantic + instance segmentation, keypoint, polygon Live
- Videoframe-by-frame, object tracking, scene segmentation, pose Live
- Audiotranscription, diarization, phoneme, emotion, event Live
- TextNER, sentiment, intent, entity linking, parallel corpus Live
- LiDAR 3Dpoint-cloud, semantic segmentation, sensor-fusion Live
- RLHFpreference, ranking, rewriting, red-team, instruction Live
- JURISDICTION
- EEA
- CONSENT
- Per-contributor
- QA
- 100% human
- LANGUAGES
- 150+
Technique coverage
Six modalities, one annotation contract.
Multi-vendor data programs accumulate per-modality DPAs, per-team QA standards, and per-tool provenance schemas. YPAI runs all six modalities under one master DPA, one consensus QA standard, and one audit-artefact bundle.
The matrix below maps technique to QA depth to output format per modality. Vertical anchors deep-link to the industry surface where the modality is most commonly procured.
| Modality | Techniques offered | QA depth | Output format | Vertical anchors |
|---|---|---|---|---|
| Image | bbox, semantic segmentation, instance segmentation, keypoint, polygon | 3-of-3 consensus + expert escalation | COCO JSON, Pascal VOC XML, custom | |
| Video | frame-by-frame labelling, object tracking, scene segmentation, 3D motion / pose | Frame-level consensus + temporal review | MOT challenge, custom XML, KITTI-format | |
| Audio | phoneme segmentation, speaker diarization + attribution, emotion labelling, audio event detection | Listener consensus + native-speaker review | TextGrid, RTTM, custom JSON | |
| Text | NER, sentiment (aspect-based), intent classification, entity linking, parallel corpus alignment | Multi-rater + linguist review | CoNLL, JSONL, custom | |
| LiDAR 3D | point-cloud annotation, LiDAR semantic segmentation, sensor-fusion (LiDAR + camera) | Frame consensus + sensor-fusion QA | KITTI bin, nuScenes JSON, custom | |
| RLHF preference | response rating, preference comparison, response rewriting, red-teaming, fact-checking, instruction tuning | Multi-rater Krippendorff alpha + lead review | OpenAI/Anthropic preference JSON, custom |
- Image
- Techniques
- bbox, semantic segmentation, instance segmentation, keypoint, polygon
- QA depth
- 3-of-3 consensus + expert escalation
- Output format
- COCO JSON, Pascal VOC XML, custom
- Video
- Techniques
- frame-by-frame labelling, object tracking, scene segmentation, 3D motion / pose
- QA depth
- Frame-level consensus + temporal review
- Output format
- MOT challenge, custom XML, KITTI-format
- Audio
- Techniques
- phoneme segmentation, speaker diarization + attribution, emotion labelling, audio event detection
- QA depth
- Listener consensus + native-speaker review
- Output format
- TextGrid, RTTM, custom JSON
- Text
- Techniques
- NER, sentiment (aspect-based), intent classification, entity linking, parallel corpus alignment
- QA depth
- Multi-rater + linguist review
- Output format
- CoNLL, JSONL, custom
- LiDAR 3D
- Techniques
- point-cloud annotation, LiDAR semantic segmentation, sensor-fusion (LiDAR + camera)
- QA depth
- Frame consensus + sensor-fusion QA
- Output format
- KITTI bin, nuScenes JSON, custom
- RLHF preference
- Techniques
- response rating, preference comparison, response rewriting, red-teaming, fact-checking, instruction tuning
- QA depth
- Multi-rater Krippendorff alpha + lead review
- Output format
- OpenAI/Anthropic preference JSON, custom
QA methodology
Five-stage QA, every modality.
Marketed quality is a metric. YPAI quality is a pipeline. Every asset passes through identity-verified annotation, multi-annotator consensus, inter-annotator agreement scoring, expert escalation on disagreement, and final lead review before delivery.
The stages below are the same across all six modalities. The metric that emerges per project is reportable and audit-defensible.
- 01
Identity-verified annotation
Annotator on the network with documented identity and EEA residency picks up the task in self-hosted CVAT, Label Studio, or proprietary tooling.
Annotator credential ID logged per asset
- 02
Multi-annotator pass
Each asset is independently labelled by 2 to 3 annotators with no visibility into peer output.
Per-annotator label record
- 03
Consensus scoring
Labels are compared. Agreement above threshold passes; disagreement routes to stage 4.
Inter-annotator agreement score (Cohen kappa or Krippendorff alpha)
- 04
Expert escalation
Disagreed assets escalate to a domain specialist (clinical, financial, legal, or modality-lead) for adjudication.
Escalation rate plus adjudication outcome per asset
- 05
Lead review and delivery sign-off
Named project lead samples the delivered batch, signs off, and the audit-artefact bundle is generated.
Sampling rate plus sign-off log
Regulatory alignment
Article 10 bias mitigation, mapped to artefacts.
Article 10 of the EU AI Act sets data-governance obligations for high-risk AI providers, effective 2026-08-02. YPAI does not certify your AI system. YPAI ships the evidence pack your conformity assessor and DPO need to argue the system is compliant.
Below: the specific 10(2) and 10(3) requirements mapped to the deliverable that addresses them.
-
Art. 10(2)(f)
Examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination.
Bias examination report per project
Per-batch demographic and dialect distribution metadata, sampling-methodology disclosure, and identified-bias-vector log. Generated from contributor-network metadata; shipped as bias_assessment.pdf in the evidence pack.
-
Art. 10(2)(g)
Appropriate measures to detect, prevent, and mitigate possible biases identified.
Mitigation actions log
Per identified bias vector, the resampling, rebalancing, or escalation action taken, with annotator-cohort change documented. Linked to the consent records so mitigation is auditable from cohort to consent.
-
Art. 10(3)
Training, validation, and testing data sets shall be relevant, representative, and to the best extent possible, free of errors and complete.
Representativeness attestation
Per-language, per-demographic, per-vertical coverage matrix with explicit gaps named. We do not claim coverage we cannot evidence. Gaps become scope-discussion in T+3 days, not hidden in the data.
-
Art. 10(2)(b, c)
Data collection processes; data preparation, including annotation, labelling, cleaning, updating, enrichment, and aggregation.
Annotation provenance log
Per-asset record of which annotators touched the asset, at which stages, with which credentials. Generated from the QA pipeline (section 3) and shipped as data_provenance.pdf.
Workforce + tooling
Identity-verified contributors, on the tools you use.
Not a marketplace crowd. A vetted European network with individual-level provenance, documented residency, and credential records for regulated-domain work. Self-hosted CVAT and Label Studio for tool flexibility, plus proprietary tooling for modalities where open-source tools fall short.
- 40,000+
- Contributor network
- 150+
- Languages covered
- 50+
- Countries with active contributors
- 100%
- Human QA coverage
Identity-verified, EEA-resident
All Nordic languages production-confirmed
All EEA-resident; non-EEA only via DPA addendum
No silent auto-QA; every asset reviewed
Annotation tooling
- CVAT (self-hosted)
- Label Studio (self-hosted)
- Proprietary annotation tooling
- Customer-tool bridge
We adapt to your tooling stack when it makes engineering sense. We do not require you to migrate to ours.
Domain specialists
Clinical, legal, financial, multilingual credentials
Domain specialists
Clinical, legal, financial, multilingual credentials
Where the modality, the regulatory context, or the asset risk-class demands credentialed reviewers, YPAI brings them in. Below: the credential categories available at scoping. Specific reviewer credentials are confirmed in the project SOW.
- Clinical
- Radiology, cardiology, pathology: licensed practitioners; HIPAA-aligned workflow available
- Legal
- EU regulation, case-law citation, contract review: bar-admitted counsel for adjudication tasks
- Financial
- MiFID II, KYC/AML, accounting taxonomies: finance domain reviewers
- Multilingual + dialect
- Native-speaker reviewers for all Nordic, Slavic, and minority European languages; dialect-level granularity at scoping
Credentialed-reviewer hourly cost is a line-item, not a hidden uplift. Quoted at scoping.
Deliverable evidence pack
Every annotation project ships an audit-ready bundle.
On delivery you receive the annotated dataset PLUS a structured evidence bundle your DPO, legal counsel, conformity assessor, and procurement team can review without follow-up. Master DPA included by default, not on request.
ypai-annotation-bundle/
|-- README.md
|-- dataset/
| |-- annotations/ # per-modality output files
| |-- manifest.csv # per-asset provenance index
| `-- checksums.sha256
|-- qa/
| |-- inter_annotator_agreement.csv # IAA score per batch (Cohen/Krippendorff)
| |-- consensus_log.csv # stage-3 disagreement + adjudication trace
| |-- expert_escalation_log.csv # stage-4 escalations + outcomes
| `-- sampling_methodology.pdf
|-- compliance/
| |-- data_provenance.pdf # Article 10(2)(b, c): annotation pipeline log
| |-- bias_assessment.pdf # Article 10(2)(f): bias examination report
| |-- representativeness.pdf # Article 10(3): coverage matrix + gaps
| |-- consent_audit.csv # GDPR Article 7 per-contributor consent records
| `-- residency_attestation.pdf # EEA processing + sub-processor disclosure
|-- credentials/ # only present if regulated-domain reviewers used
| `-- reviewer_credentials.pdf
`-- contract/
|-- master_dpa.pdf # Article 28 pre-cleared
`-- sccs_if_applicable.pdf # only if customer-directed extra-EEA transfer -
inter_annotator_agreement.csvCohen kappa or Krippendorff alpha per batch. The metric your conformity assessor will reference, surfaced before they ask.
-
bias_assessment.pdfArticle 10(2)(f) deliverable: demographic, dialect, and vertical-coverage distribution, identified bias vectors, and mitigation actions taken.
-
consent_audit.csvPer-contributor GDPR Article 7 consent record. Not platform-ToS aggregate consent: specific-purpose, per-project, withdrawable.
-
data_provenance.pdfPer-asset annotator-touch log: which credential touched which asset at which QA stage. Article 10(2)(b, c) ready.
-
residency_attestation.pdfEEA processing attestation. Sub-processor list named. No CLOUD Act exposure by default.
-
master_dpa.pdfArticle 28 GDPR DPA pre-cleared. Customer-specific addendums (residency, sub-processor) accepted; the baseline DPA ships by default, not on request.
Next step
Scope an annotation project.
Tell us the modality, the regulatory context, and the volume. We map a delivery plan with the QA pipeline, evidence pack, and master DPA included by default.
EU AI Act Article 10 applies from 2026-08-02. Cumulative GDPR fines have passed EUR 7.1B. Getting annotation provenance wrong is procurement-blocking. Getting it right is a conversation.
Master DPA included with every YPAI engagement, not on request. Norwegian legal entity. Named project lead replies within one EU business day.
Data annotation intake
Scope an annotation project.
Bring modality, regulatory context, volume estimate, and any tooling-stack constraints. A named EU-resident project lead replies within one EU business day with a feasibility read and Article 10 risk classification.
- GDPR Article 7 consent records on every asset
- EEA-only operations, Norwegian Aksjeselskap
- Identity-verified contributors only: no marketplace crowd
- 30-day erasure SLA on contributor withdrawal
GDPR Article 7 . GDPR Article 9 . EU AI Act Article 10(2)(f)
What happens next
From submit to scoped pilot in seven days.
Three states this serves: you have submitted and want to know the timing, you are about to submit and have a procurement objection, or you are not ready to submit and want a route deeper into the work.
After you submit
Procurement FAQ
Or explore deeper
-
Data Collection
Upstream pillar: where the data comes from before annotation. Multi-modal under one master DPA.
-
Data Validation
Downstream pillar: how datasets are validated for Article 10 conformity before training.
-
Ethical Framework
Lateral pillar: the consent and ethics framework underneath every YPAI engagement.
-
Vertical Solutions
Industry verticals: automotive, healthcare, finserv, defence, public sector.