Trace Every Voice in Your Dataset Back to a Named Individual
Full provenance documentation from speaker to dataset. Identity verification, consent records, collection methodology, demographic metadata, and EU AI Act data cards. Every dataset is audit-ready.
EU AI Act compliant. Norwegian entity. European servers.
Speaker
ID-verified, named individual
Consent
Timestamped, revocable, auditable
Recording
Session metadata + environment
QA Review
Human-verified quality gate
Dataset
EU AI Act data card attached
Six Links. Every One Documented and Auditable.
Data provenance is the complete chain of custody from the moment a contributor is identified to the moment your dataset is delivered - and beyond. Each link is independently verifiable.
Contributor Identity
Government ID verification. Real name, real person, real location. No anonymous crowd workers.
Informed Consent
Individual consent with explicit purpose, duration, and right-to-erasure. Not a blanket platform TOS.
Recording Session
Environment metadata, device specs, acoustic conditions, timestamp, session duration. All captured automatically.
QA Review
Human quality review with pass/fail criteria. Rejected recordings documented with reason codes.
Dataset Delivery
EU AI Act data card, speaker metadata, collection methodology report, and QA metrics bundled with every delivery.
Ongoing Right-to-Erasure
Contributors can request data removal at any time. Erasure propagates through the chain and is confirmed in writing.
EU AI Act Provenance Requirements
Since August 2, 2025, deployers of high-risk AI systems must document data provenance. Here is what the regulation mandates - and what YPAI delivers by default.
Mandatory Data Provenance Template
High-risk AI systems require full technical documentation of training datasets, including data types, sources, collection methodology, and provenance chains. This is not optional - it is a legal requirement with penalties up to 7% of global annual revenue.
Technical Documentation
Detailed description of data processing and preparation, including annotation, labelling, cleaning, enrichment, and aggregation. Every YPAI dataset ships with this.
Post-Market Monitoring
Providers must update documentation every 6 months. YPAI data cards include version tracking and update schedules built in.
Up to 7%
of global annual revenue for non-compliance
What Ships With Every YPAI Dataset
No add-ons, no premium tiers. Every dataset delivery includes the full documentation package your compliance team needs.
EU AI Act Data Cards
Structured provenance documentation following the mandatory template. Data types, sources, collection methods, bias assessments, and quality metrics - formatted for regulatory submission.
Speaker Metadata
Age range, gender, accent classification, dialect, recording environment, device specifications. Demographic representation metrics across your full dataset.
Consent Documentation
Individual consent records with timestamps, explicit purpose declarations, duration terms, and documented right-to-erasure pathways. Not platform TOS - individual auditable consent chains.
Collection Methodology Report
Detailed documentation of recording protocols, quality criteria, environment specifications, annotation guidelines, and inter-annotator agreement metrics.
QA Metrics Report
Pass/fail rates, rejection reason codes, signal-to-noise ratios, transcription accuracy scores, and human review coverage percentages. Full transparency on data quality.
Crowdsourced vs Vetted Provenance
The difference between platform-mediated consent from anonymous workers and individually auditable consent chains from named contributors.
Every Dataset Audit-Ready. No Retrofitting.
Request a sample data card to see what full provenance documentation looks like, or get a custom quote for your next training data project.
Norwegian entity · European jurisdiction · Zero CLOUD Act exposure