Automotive voice + audio data · Oslo · EEA residency
Voice your cars actually need: 150+ languages, real cabin noise, EU AI Act ready.
From a 150+ language corpus, 50+ productionised for in-cabin. Native-speaker, real cabin
noise, wake-word and DMS audio included. ADAS perception data on the same engagement.
150+ / 50+
language corpus, productionised for in-cabin
Native
cabin-noise capture: engine, HVAC, road, passenger overlap. Not lab-clean.
EEA
processing by default. Article 10 evidence at delivery. DPA included.
DPA included · EEA-only processing · Project lead replies within one
business day
Language corpus
150+
50+ already productionised for in-cabin voice. Native-speaker recording,
consent-tracked.
Cabin capture
Native
Engine, HVAC, road, passenger overlap. Captured in real cabins, not booth-recorded and
re-mixed.
Jurisdiction
EEA
Processing by default. EU AI Act Article 10 evidence at delivery. DPA included with
every engagement.
Perception stack
Fused
LiDAR + radar + camera, time-aligned. ASIL-aware taxonomy per ISO 26262.
Vehicle programs & OEM engagements
Evaluation engagements, data-collection programs, and active annotation work across
OEM and Tier-1 suppliers.
Where in-cabin voice breaks
Voice assistants that pass the demo and fail the highway.
Cabin noise, accent drift, and Article 10 evidence gaps cost more programs than
perception ever does. Four places the data, not the model, is the bottleneck.
01~70 dB SPL at 110 km/h
In-cabin voice fails
The wake word your buyer hears in the showroom is not the one the driver yells at 110
km/h.
Highway cabins reach ~70 dB SPL at 110 km/h: engine drone, HVAC, road, passenger
overlap. Lab-recorded corpora flatter the model in eval and ship false rejects to the
customer on day one. The fix is recording in the actual cabin, not cleaning up booth
audio after the fact.
02Nordic + EU + code-switch
Coverage gaps
The corpus that worked for English plus five EU languages breaks on the sixth market.
Nordic dialects, Eastern European accents, and code-switching cabin conversations
(driver in one language, passenger replying in another) are the long tail US-domiciled
vendors do not cover natively. YPAI captures these in-region with native speakers, not
synthetic accent transfer.
032026-08-02 gate
EU AI Act Article 10
Data lineage the auditor can replay, for both pillars.
In-cabin voice systems and ADAS perception both classify as high-risk AI. Article 10
audits come down to three themes: data lineage, representativeness, and dataset-level
bias examination. Vendors that produce evidence only on request cost you the audit
window. YPAI ships the evidence pack at delivery, not on subpoena.
046-8 weeks
Perception edge-case retrain
One unhandled occlusion. Eight weeks of retrain plus ASPICE re-validation.
A cyclist behind a parked van, a sensor calibration that disagrees by one frame
between LiDAR cuboid and 2D box. The fix is not a bigger dataset; it is ASIL-aware
curation of the long tail before it ships, with cross-sensor synchronisation that
holds at 40 TB per hour of capture.
Engagement pipeline
One pipeline. Voice ingest to Article 10 evidence.
Five stages. Click any stage to see the actual artifact YPAI delivers at that point in the
engagement.
โ
โ
โ
โ
01 · Recording What ships from this stage
Native-speaker capture in real cabin noise. Files delivered with every engagement:
audio_raw_48khz.wav
speakers.json (demographic metadata)
recording_environment.json (mic, room, ambient)
consent_log.csv (per-speaker, GDPR Article 7)
02 · Cabin-noise capture Scenario coverage matrix
Each engagement specifies a coverage matrix. Example axes:
Idle
60 km/h
110 km/h
Wipers on
HVAC max
Window down
Passenger overlap
Music ducked
Sample count and gap analysis per cell delivered with the corpus.
Three voice-data shapes a procurement reviewer can scope today.
In-cabin multilingual voice corpus build
Native-speaker recording across 50+ target-market languages in real cabin noise, captured by a vetted contributor network under GDPR-tracked consent. Wake-word ground truth and intent labels delivered with the audio. Output is a reusable corpus for in-cabin assistant fine-tuning, not a one-shot dataset.
50+ languages ยท GDPR consent
Wake-word + intent dataset for a branded in-cabin assistant
Per-OEM branded wake word with false-accept and false-reject sets, an intent taxonomy aligned to the buyer's NLU schema, and command classification across target-market languages. Ground truth versioned alongside your on-device model so retrain cycles do not lose lineage.
Per-OEM wake word ยท NLU-aligned
DMS audio cues paired with gaze ground truth
Drowsiness, distraction, and emergency audio classes captured and labeled with severity tiers, paired with gaze and eyelid ground truth where the DMS spec requires both modalities. Aligned to Euro NCAP 2026 in-cabin protocols and to ISO 21448 SOTIF taxonomies for edge-case coverage.
Euro NCAP 2026 ยท multi-modal
START WITH A FREE PILOT
Tell us the modalities, volumes, and target market. We reply with a scoped pilot.
The free pilot covers recording AND annotation: two target languages, five hours of
native-speaker in-cabin voice per language in real cabin noise, one thousand utterances
per language with transcript and wake-word and intent labels. Delivered in five to seven
business days. From there, the production engagement scopes by language coverage, scale,
and regulatory context.
2 languages, 5h recording + 1000 utterances per language
Native speakers, real cabin noise, 5-7 days delivery
Data lineage, consent framework, bias mitigation documented
Inquiry Received
Brief received.
We reply within one EU business day with a feasibility read for automotive programs. Sensor-fusion and validation specs ship with the first reply.
What happens next
From submit to scoped pilot in seven days
After you submit the brief above, here is the timeline. The free pilot at T+5-7 days
delivers real in-cabin voice recording against your target language and noise profile, not
a deck.
T+1 day
Project lead reads your brief
A named EU-resident project lead replies within one business day with feasibility,
language coverage, and a first read on Article 10 risk classification.
T+3 days
Scope, cabin-noise spec, sample evidence pack
Indicative scope returned with target languages, cabin-noise capture profile (engine,
HVAC, road, passenger overlap), wake-word spec, and Article 10 evidence pack manifest.
T+5-7 days
Free pilot delivered
Free pilot covers recording AND annotation: 2 languages, 5h native-speaker recording per
language in real cabin noise, 1000 utterances per language with transcript and wake-word
and intent labels. Production engagement scopes from there by language coverage, scale,
and regulatory context.
T+14 days
Master DPA signed, production scope locked
Article 28 clauses pre-cleared, EEA-resident processing committed in contract.
Sub-processor list named, withdrawal SLA confirmed. Production recording begins.
Norwegian Aksjeselskap. EEA-resident operations. GDPR Article 7 consent on every
contributor. EU AI Act Article 10 evidence pack at delivery.
Compliance and data governance
The compliance posture procurement reviews require.
Voice and perception data both classify as high-risk training material under EU AI Act
Article 10. Both need documented lineage, consent provenance, and EEA-resident processing.
Show me only:
Control categoryRegulation or frameworkWhat YPAI shipsEvidence artifact
ControlData lineage and traceability
RegulationEU AI Act Article 10
What YPAI shipsPer-sample provenance trail across collection, annotation, QA, delivery.
Evidence data_provenance.pdf
ControlRepresentativeness assessment
RegulationEU AI Act Article 10
What YPAI shipsDemographic and geographic coverage analysis vs deployment population.
Evidence bias_assessment.pdf
ControlDataset-level bias examination
RegulationEU AI Act Article 10
What YPAI shipsAudit-ready bias examination across age, accent, dialect, behavioural segments.
Evidence bias_assessment.pdf
ControlConsent framework
RegulationGDPR Article 7
What YPAI shipsPer-speaker informed consent, withdrawal rights, 30-day erasure SLA on speaker request.
Evidence consent_audit.csv
ControlSpecial-category data handling
RegulationGDPR Article 9
What YPAI shipsVoice biometric and demographic attributes handled under explicit consent and minimisation.
Evidence consent_audit.csv
ControlDPA (Article 28 contract)
RegulationGDPR Article 28
What YPAI shipsDPA included with every contract. Pre-cleared clauses, no extra negotiation cycle.
Evidence dpa.pdf
ControlFunctional safety taxonomy
RegulationISO 26262
What YPAI shipsAnnotation specs designed to support functional-safety ASIL classifications and traceability.
Evidence asil_taxonomy.pdf
ControlUnknown-unsafe scenario coverage
RegulationISO 21448 SOTIF
What YPAI shipsEdge-case taxonomy and curation aligned to SOTIF unknown-unsafe categories.
Evidence sotif_taxonomy.pdf
ControlCybersecurity-relevant data handling
RegulationISO/SAE 21434
What YPAI shipsCybersecurity-relevant data handling protocols across voice and sensor pipelines.
Evidence cybersec_handling.pdf
ControlProcess discipline
RegulationASPICE
What YPAI shipsQA workflows slot into automotive SPICE engineering processes. Deliverables pass Tier-1 process audits.
Evidence qa_process_log.pdf
ControlEEA residency
RegulationGDPR plus Norwegian AS
What YPAI shipsEEA infrastructure by default. Named sub-processor list at engagement scoping. No transfers outside EEA without explicit DPA addendum.
Evidence residency_attestation.pdf
ControlLegal entity
RegulationNorwegian company
What YPAI shipsNorwegian company, EEA jurisdiction by incorporation, not by sub-processor claim. Outside US CLOUD Act reach.
Evidence entity_attestation.pdf
30-day erasure SLA applies to both speaker recordings and perception assets. Sub-processor
list and evidence artifacts delivered at engagement scoping.
Our automotive surface
Voice-led, perception-ready. One vendor, one contract.
The full automotive data surface in one engagement: in-cabin voice corpora, wake-word and
DMS audio, perception annotation, real-world capture. EEA jurisdiction across all of it.
Voice & User Experience
In-cabin voice corpora across 150+ languages (50+ productionised), wake-word and intent labeling, DMS audio cues, and TTS for branded voice personalities. Native-speaker capture in real cabin noise.
Cross-modal data collection and annotation pipelines that underpin both voice and perception programs: real-world capture, video and image annotation, Article 10 provenance throughout.
LiDAR, sensor fusion, and semantic segmentation when the voice engagement extends into the perception stack. ASIL-aware taxonomies, SOTIF edge-case coverage.