Automotive voice + audio data · Oslo · EEA residency

Voice your cars actually need: 150+ languages, real cabin noise, EU AI Act ready.

From a 150+ language corpus, 50+ productionised for in-cabin. Native-speaker, real cabin noise, wake-word and DMS audio included. ADAS perception data on the same engagement.

150+ / 50+

language corpus, productionised for in-cabin

Native

cabin-noise capture: engine, HVAC, road, passenger overlap. Not lab-clean.

EEA

processing by default. Article 10 evidence at delivery. DPA included.

Scope a project See language coverage + cabin-noise spec

DPA included · EEA-only processing · Project lead replies within one business day

Language corpus

150+

50+ already productionised for in-cabin voice. Native-speaker recording, consent-tracked.

Cabin capture

Native

Engine, HVAC, road, passenger overlap. Captured in real cabins, not booth-recorded and re-mixed.

Jurisdiction

EEA

Processing by default. EU AI Act Article 10 evidence at delivery. DPA included with every engagement.

Perception stack

Fused

LiDAR + radar + camera, time-aligned. ASIL-aware taxonomy per ISO 26262.

Vehicle programs & OEM engagements

Evaluation engagements, data-collection programs, and active annotation work across OEM and Tier-1 suppliers.

Where in-cabin voice breaks

Voice assistants that pass the demo and fail the highway.

Cabin noise, accent drift, and Article 10 evidence gaps cost more programs than perception ever does. Four places the data, not the model, is the bottleneck.

01 ~70 dB SPL at 110 km/h

In-cabin voice fails

The wake word your buyer hears in the showroom is not the one the driver yells at 110 km/h.

Highway cabins reach ~70 dB SPL at 110 km/h: engine drone, HVAC, road, passenger overlap. Lab-recorded corpora flatter the model in eval and ship false rejects to the customer on day one. The fix is recording in the actual cabin, not cleaning up booth audio after the fact.

02 Nordic + EU + code-switch

Coverage gaps

The corpus that worked for English plus five EU languages breaks on the sixth market.

Nordic dialects, Eastern European accents, and code-switching cabin conversations (driver in one language, passenger replying in another) are the long tail US-domiciled vendors do not cover natively. YPAI captures these in-region with native speakers, not synthetic accent transfer.

03 2026-08-02 gate

EU AI Act Article 10

Data lineage the auditor can replay, for both pillars.

In-cabin voice systems and ADAS perception both classify as high-risk AI. Article 10 audits come down to three themes: data lineage, representativeness, and dataset-level bias examination. Vendors that produce evidence only on request cost you the audit window. YPAI ships the evidence pack at delivery, not on subpoena.

04 6-8 weeks

Perception edge-case retrain

One unhandled occlusion. Eight weeks of retrain plus ASPICE re-validation.

A cyclist behind a parked van, a sensor calibration that disagrees by one frame between LiDAR cuboid and 2D box. The fix is not a bigger dataset; it is ASIL-aware curation of the long tail before it ships, with cross-sensor synchronisation that holds at 40 TB per hour of capture.

Engagement pipeline

One pipeline. Voice ingest to Article 10 evidence.

Five stages. Click any stage to see the actual artifact YPAI delivers at that point in the engagement.

01 · Recording What ships from this stage

Native-speaker capture in real cabin noise. Files delivered with every engagement:

audio_raw_48khz.wav
speakers.json (demographic metadata)
recording_environment.json (mic, room, ambient)
consent_log.csv (per-speaker, GDPR Article 7)

02 · Cabin-noise capture Scenario coverage matrix

Each engagement specifies a coverage matrix. Example axes:

Idle
60 km/h
110 km/h
Wipers on
HVAC max
Window down
Passenger overlap
Music ducked

Sample count and gap analysis per cell delivered with the corpus.

03 · Annotation Labelled JSON snippet (schema fields)

Wake-word ground truth, intent slot, entity extraction, DMS state tag.

schema_version
utterance_id
audio_path
wake_word_present (boolean + timestamp)
intent (slot taxonomy)
entities[] (per-utterance extraction)
dms_state (drowsy / distracted / nominal)
speaker_demographics

04 · 100% human QA QA escalation log structure

Sample disagreement-resolution log fields:

batch_id
sample_count
inter_annotator_agreement
disagreement_count
reviewer_credentials
resolution_log_path

Audit-trail per batch. Available on procurement request.

05 · Article 10 evidence pack Evidence pack contents

Delivered with every engagement (not on subpoena). Article 10 high-risk AI documentation.

data_provenance.pdf
bias_assessment.pdf
consent_audit.csv
risk_classification.md
residency_attestation.pdf

Request full sample

ENGAGEMENT SHAPES

Three voice-data shapes a procurement reviewer can scope today.

In-cabin multilingual voice corpus build

Native-speaker recording across 50+ target-market languages in real cabin noise, captured by a vetted contributor network under GDPR-tracked consent. Wake-word ground truth and intent labels delivered with the audio. Output is a reusable corpus for in-cabin assistant fine-tuning, not a one-shot dataset.

50+ languages · GDPR consent

Wake-word + intent dataset for a branded in-cabin assistant

Per-OEM branded wake word with false-accept and false-reject sets, an intent taxonomy aligned to the buyer's NLU schema, and command classification across target-market languages. Ground truth versioned alongside your on-device model so retrain cycles do not lose lineage.

Per-OEM wake word · NLU-aligned

DMS audio cues paired with gaze ground truth

Drowsiness, distraction, and emergency audio classes captured and labeled with severity tiers, paired with gaze and eyelid ground truth where the DMS spec requires both modalities. Aligned to Euro NCAP 2026 in-cabin protocols and to ISO 21448 SOTIF taxonomies for edge-case coverage.

Euro NCAP 2026 · multi-modal

START WITH A FREE PILOT

Tell us the modalities, volumes, and target market. We reply with a scoped pilot.

The free pilot covers recording AND annotation: two target languages, five hours of native-speaker in-cabin voice per language in real cabin noise, one thousand utterances per language with transcript and wake-word and intent labels. Delivered in five to seven business days. From there, the production engagement scopes by language coverage, scale, and regulatory context.

2 languages, 5h recording + 1000 utterances per language

Native speakers, real cabin noise, 5-7 days delivery

DPA included from day one

Article 28 clauses pre-cleared, EEA-resident processing

EU AI Act Article 10 evidence pack at delivery

Data lineage, consent framework, bias mitigation documented

What happens next

From submit to scoped pilot in seven days

After you submit the brief above, here is the timeline. The free pilot at T+5-7 days delivers real in-cabin voice recording against your target language and noise profile, not a deck.

T+1 day

Project lead reads your brief

A named EU-resident project lead replies within one business day with feasibility, language coverage, and a first read on Article 10 risk classification.
T+3 days

Scope, cabin-noise spec, sample evidence pack

Indicative scope returned with target languages, cabin-noise capture profile (engine, HVAC, road, passenger overlap), wake-word spec, and Article 10 evidence pack manifest.
T+5-7 days

Free pilot delivered

Free pilot covers recording AND annotation: 2 languages, 5h native-speaker recording per language in real cabin noise, 1000 utterances per language with transcript and wake-word and intent labels. Production engagement scopes from there by language coverage, scale, and regulatory context.
T+14 days

Master DPA signed, production scope locked

Article 28 clauses pre-cleared, EEA-resident processing committed in contract. Sub-processor list named, withdrawal SLA confirmed. Production recording begins.

Norwegian Aksjeselskap. EEA-resident operations. GDPR Article 7 consent on every contributor. EU AI Act Article 10 evidence pack at delivery.

Compliance and data governance

The compliance posture procurement reviews require.

Voice and perception data both classify as high-risk training material under EU AI Act Article 10. Both need documented lineage, consent provenance, and EEA-resident processing.

Show me only:

Control category Regulation or framework What YPAI ships Evidence artifact

Control Data lineage and traceability

Regulation EU AI Act Article 10

What YPAI ships Per-sample provenance trail across collection, annotation, QA, delivery.

Evidence data_provenance.pdf

Control Representativeness assessment

Regulation EU AI Act Article 10

What YPAI ships Demographic and geographic coverage analysis vs deployment population.

Evidence bias_assessment.pdf

Control Dataset-level bias examination

Regulation EU AI Act Article 10

What YPAI ships Audit-ready bias examination across age, accent, dialect, behavioural segments.

Evidence bias_assessment.pdf

Control Consent framework

Regulation GDPR Article 7

What YPAI ships Per-speaker informed consent, withdrawal rights, 30-day erasure SLA on speaker request.

Evidence consent_audit.csv

Control Special-category data handling

Regulation GDPR Article 9

What YPAI ships Voice biometric and demographic attributes handled under explicit consent and minimisation.

Evidence consent_audit.csv

Control DPA (Article 28 contract)

Regulation GDPR Article 28

What YPAI ships DPA included with every contract. Pre-cleared clauses, no extra negotiation cycle.

Evidence dpa.pdf

Control Functional safety taxonomy

Regulation ISO 26262

What YPAI ships Annotation specs designed to support functional-safety ASIL classifications and traceability.

Evidence asil_taxonomy.pdf

Control Unknown-unsafe scenario coverage

Regulation ISO 21448 SOTIF

What YPAI ships Edge-case taxonomy and curation aligned to SOTIF unknown-unsafe categories.

Evidence sotif_taxonomy.pdf

Control Cybersecurity-relevant data handling

Regulation ISO/SAE 21434

What YPAI ships Cybersecurity-relevant data handling protocols across voice and sensor pipelines.

Evidence cybersec_handling.pdf

Control Process discipline

Regulation ASPICE

What YPAI ships QA workflows slot into automotive SPICE engineering processes. Deliverables pass Tier-1 process audits.

Evidence qa_process_log.pdf

Control EEA residency

Regulation GDPR plus Norwegian AS

What YPAI ships EEA infrastructure by default. Named sub-processor list at engagement scoping. No transfers outside EEA without explicit DPA addendum.

Evidence residency_attestation.pdf

Control Legal entity

Regulation Norwegian company

What YPAI ships Norwegian company, EEA jurisdiction by incorporation, not by sub-processor claim. Outside US CLOUD Act reach.

Evidence entity_attestation.pdf

30-day erasure SLA applies to both speaker recordings and perception assets. Sub-processor list and evidence artifacts delivered at engagement scoping.

Our automotive surface

Voice-led, perception-ready. One vendor, one contract.

The full automotive data surface in one engagement: in-cabin voice corpora, wake-word and DMS audio, perception annotation, real-world capture. EEA jurisdiction across all of it.

Voice & User Experience

In-cabin voice corpora across 150+ languages (50+ productionised), wake-word and intent labeling, DMS audio cues, and TTS for branded voice personalities. Native-speaker capture in real cabin noise.

Data Infrastructure & Services

Cross-modal data collection and annotation pipelines that underpin both voice and perception programs: real-world capture, video and image annotation, Article 10 provenance throughout.

Autonomous Vehicle Technologies

LiDAR, sensor fusion, and semantic segmentation when the voice engagement extends into the perception stack. ASIL-aware taxonomies, SOTIF edge-case coverage.

Fleet & Operations

Telematics annotation and dashcam video work for fleet AI and predictive maintenance programs.

Voice your cars actually need: 150+ languages, real cabin noise, EU AI Act ready.

Voice assistants that pass the demo and fail the highway.

The wake word your buyer hears in the showroom is not the one the driver yells at 110 km/h.

The corpus that worked for English plus five EU languages breaks on the sixth market.

Data lineage the auditor can replay, for both pillars.

One unhandled occlusion. Eight weeks of retrain plus ASPICE re-validation.

Three voice-data shapes a procurement reviewer can scope today.

In-cabin multilingual voice corpus build

Wake-word + intent dataset for a branded in-cabin assistant

DMS audio cues paired with gaze ground truth

Tell us the modalities, volumes, and target market. We reply with a scoped pilot.

2 languages, 5h recording + 1000 utterances per language

DPA included from day one

EU AI Act Article 10 evidence pack at delivery

Brief received.

Project lead reads your brief

Scope, cabin-noise spec, sample evidence pack

Free pilot delivered

Master DPA signed, production scope locked

Voice-led, perception-ready. One vendor, one contract.

Voice & User Experience

Data Infrastructure & Services

Autonomous Vehicle Technologies

Fleet & Operations