Automotive voice + audio data · Oslo · EEA residency

Voice your cars actually need: 150+ languages, real cabin noise, EU AI Act ready.

From a 150+ language corpus, 50+ productionised for in-cabin. Native-speaker, real cabin noise, wake-word and DMS audio included. ADAS perception data on the same engagement.

150+ / 50+
language corpus, productionised for in-cabin
Native
cabin-noise capture: engine, HVAC, road, passenger overlap. Not lab-clean.
EEA
processing by default. Article 10 evidence at delivery. DPA included.

DPA included · EEA-only processing · Project lead replies within one business day

Language corpus
150+
50+ already productionised for in-cabin voice. Native-speaker recording, consent-tracked.
Cabin capture
Native
Engine, HVAC, road, passenger overlap. Captured in real cabins, not booth-recorded and re-mixed.
Jurisdiction
EEA
Processing by default. EU AI Act Article 10 evidence at delivery. DPA included with every engagement.
Perception stack
Fused
LiDAR + radar + camera, time-aligned. ASIL-aware taxonomy per ISO 26262.
Vehicle programs & OEM engagements

Evaluation engagements, data-collection programs, and active annotation work across OEM and Tier-1 suppliers.

MGHondaKiaBYDXPengOmodaNIOHyundaiChangan
Where in-cabin voice breaks

Voice assistants that pass the demo and fail the highway.

Cabin noise, accent drift, and Article 10 evidence gaps cost more programs than perception ever does. Four places the data, not the model, is the bottleneck.

01 ~70 dB SPL at 110 km/h

In-cabin voice fails

The wake word your buyer hears in the showroom is not the one the driver yells at 110 km/h.

Highway cabins reach ~70 dB SPL at 110 km/h: engine drone, HVAC, road, passenger overlap. Lab-recorded corpora flatter the model in eval and ship false rejects to the customer on day one. The fix is recording in the actual cabin, not cleaning up booth audio after the fact.

02 Nordic + EU + code-switch

Coverage gaps

The corpus that worked for English plus five EU languages breaks on the sixth market.

Nordic dialects, Eastern European accents, and code-switching cabin conversations (driver in one language, passenger replying in another) are the long tail US-domiciled vendors do not cover natively. YPAI captures these in-region with native speakers, not synthetic accent transfer.

03 2026-08-02 gate

EU AI Act Article 10

Data lineage the auditor can replay, for both pillars.

In-cabin voice systems and ADAS perception both classify as high-risk AI. Article 10 audits come down to three themes: data lineage, representativeness, and dataset-level bias examination. Vendors that produce evidence only on request cost you the audit window. YPAI ships the evidence pack at delivery, not on subpoena.

04 6-8 weeks

Perception edge-case retrain

One unhandled occlusion. Eight weeks of retrain plus ASPICE re-validation.

A cyclist behind a parked van, a sensor calibration that disagrees by one frame between LiDAR cuboid and 2D box. The fix is not a bigger dataset; it is ASIL-aware curation of the long tail before it ships, with cross-sensor synchronisation that holds at 40 TB per hour of capture.

Engagement pipeline

One pipeline. Voice ingest to Article 10 evidence.

Five stages. Click any stage to see the actual artifact YPAI delivers at that point in the engagement.

01 · Recording What ships from this stage

Native-speaker capture in real cabin noise. Files delivered with every engagement:

  • audio_raw_48khz.wav
  • speakers.json (demographic metadata)
  • recording_environment.json (mic, room, ambient)
  • consent_log.csv (per-speaker, GDPR Article 7)
ENGAGEMENT SHAPES

Three voice-data shapes a procurement reviewer can scope today.

In-cabin multilingual voice corpus build

Native-speaker recording across 50+ target-market languages in real cabin noise, captured by a vetted contributor network under GDPR-tracked consent. Wake-word ground truth and intent labels delivered with the audio. Output is a reusable corpus for in-cabin assistant fine-tuning, not a one-shot dataset.

50+ languages ยท GDPR consent

Wake-word + intent dataset for a branded in-cabin assistant

Per-OEM branded wake word with false-accept and false-reject sets, an intent taxonomy aligned to the buyer's NLU schema, and command classification across target-market languages. Ground truth versioned alongside your on-device model so retrain cycles do not lose lineage.

Per-OEM wake word ยท NLU-aligned

DMS audio cues paired with gaze ground truth

Drowsiness, distraction, and emergency audio classes captured and labeled with severity tiers, paired with gaze and eyelid ground truth where the DMS spec requires both modalities. Aligned to Euro NCAP 2026 in-cabin protocols and to ISO 21448 SOTIF taxonomies for edge-case coverage.

Euro NCAP 2026 ยท multi-modal
START WITH A FREE PILOT

Tell us the modalities, volumes, and target market. We reply with a scoped pilot.

The free pilot covers recording AND annotation: two target languages, five hours of native-speaker in-cabin voice per language in real cabin noise, one thousand utterances per language with transcript and wake-word and intent labels. Delivered in five to seven business days. From there, the production engagement scopes by language coverage, scale, and regulatory context.

2 languages, 5h recording + 1000 utterances per language

Native speakers, real cabin noise, 5-7 days delivery

DPA included from day one

Article 28 clauses pre-cleared, EEA-resident processing

EU AI Act Article 10 evidence pack at delivery

Data lineage, consent framework, bias mitigation documented

What happens next

From submit to scoped pilot in seven days

After you submit the brief above, here is the timeline. The free pilot at T+5-7 days delivers real in-cabin voice recording against your target language and noise profile, not a deck.

  1. T+1 day

    Project lead reads your brief

    A named EU-resident project lead replies within one business day with feasibility, language coverage, and a first read on Article 10 risk classification.

  2. T+3 days

    Scope, cabin-noise spec, sample evidence pack

    Indicative scope returned with target languages, cabin-noise capture profile (engine, HVAC, road, passenger overlap), wake-word spec, and Article 10 evidence pack manifest.

  3. T+5-7 days

    Free pilot delivered

    Free pilot covers recording AND annotation: 2 languages, 5h native-speaker recording per language in real cabin noise, 1000 utterances per language with transcript and wake-word and intent labels. Production engagement scopes from there by language coverage, scale, and regulatory context.

  4. T+14 days

    Master DPA signed, production scope locked

    Article 28 clauses pre-cleared, EEA-resident processing committed in contract. Sub-processor list named, withdrawal SLA confirmed. Production recording begins.

Norwegian Aksjeselskap. EEA-resident operations. GDPR Article 7 consent on every contributor. EU AI Act Article 10 evidence pack at delivery.

Compliance and data governance

The compliance posture procurement reviews require.

Voice and perception data both classify as high-risk training material under EU AI Act Article 10. Both need documented lineage, consent provenance, and EEA-resident processing.

Show me only:
Control category Regulation or framework What YPAI ships Evidence artifact
Control Data lineage and traceability
Regulation EU AI Act Article 10
What YPAI ships Per-sample provenance trail across collection, annotation, QA, delivery.
Evidence data_provenance.pdf
Control Representativeness assessment
Regulation EU AI Act Article 10
What YPAI ships Demographic and geographic coverage analysis vs deployment population.
Evidence bias_assessment.pdf
Control Dataset-level bias examination
Regulation EU AI Act Article 10
What YPAI ships Audit-ready bias examination across age, accent, dialect, behavioural segments.
Evidence bias_assessment.pdf
Control Consent framework
Regulation GDPR Article 7
What YPAI ships Per-speaker informed consent, withdrawal rights, 30-day erasure SLA on speaker request.
Evidence consent_audit.csv
Control Special-category data handling
Regulation GDPR Article 9
What YPAI ships Voice biometric and demographic attributes handled under explicit consent and minimisation.
Evidence consent_audit.csv
Control DPA (Article 28 contract)
Regulation GDPR Article 28
What YPAI ships DPA included with every contract. Pre-cleared clauses, no extra negotiation cycle.
Evidence dpa.pdf
Control Functional safety taxonomy
Regulation ISO 26262
What YPAI ships Annotation specs designed to support functional-safety ASIL classifications and traceability.
Evidence asil_taxonomy.pdf
Control Unknown-unsafe scenario coverage
Regulation ISO 21448 SOTIF
What YPAI ships Edge-case taxonomy and curation aligned to SOTIF unknown-unsafe categories.
Evidence sotif_taxonomy.pdf
Control Cybersecurity-relevant data handling
Regulation ISO/SAE 21434
What YPAI ships Cybersecurity-relevant data handling protocols across voice and sensor pipelines.
Evidence cybersec_handling.pdf
Control Process discipline
Regulation ASPICE
What YPAI ships QA workflows slot into automotive SPICE engineering processes. Deliverables pass Tier-1 process audits.
Evidence qa_process_log.pdf
Control EEA residency
Regulation GDPR plus Norwegian AS
What YPAI ships EEA infrastructure by default. Named sub-processor list at engagement scoping. No transfers outside EEA without explicit DPA addendum.
Evidence residency_attestation.pdf
Control Legal entity
Regulation Norwegian company
What YPAI ships Norwegian company, EEA jurisdiction by incorporation, not by sub-processor claim. Outside US CLOUD Act reach.
Evidence entity_attestation.pdf

30-day erasure SLA applies to both speaker recordings and perception assets. Sub-processor list and evidence artifacts delivered at engagement scoping.

Our automotive surface

Voice-led, perception-ready. One vendor, one contract.

The full automotive data surface in one engagement: in-cabin voice corpora, wake-word and DMS audio, perception annotation, real-world capture. EEA jurisdiction across all of it.