Norwegian Nynorsk
1,420 vetted
Western dialects, Inland
Details
Concentrated in Vestlandet + Inland counties. ~38% female / 62% male active, ages 22β58. Common utterance length: 4β18s. Used to fix ASR bias against non-BokmΓ₯l Norwegian.
PREMIUM AUDIO DATA
40,000 vetted freelancers across every EU language and the dialects that matter β Jutlandic Danish, Swiss German, Galician, SΓ‘mi, Frisian, and 90+ more. Audit-ready for the EU AI Act, GDPR-compliant by design.
The bottleneck isn’t model capability. It’s training data your speakers actually speak.
COVERAGE MATRIX
Browse 8 representative coverage areas below. Open the full matrix to see all 24 EU languages and 90+ dialects, with vetted-speaker counts per language.
1,420 vetted
Western dialects, Inland
Concentrated in Vestlandet + Inland counties. ~38% female / 62% male active, ages 22β58. Common utterance length: 4β18s. Used to fix ASR bias against non-BokmΓ₯l Norwegian.
740 vetted
Western, SΓΈnderjysk
SΓΈnderjysk (South Jutlandic) has high regional variance. Speakers vetted across Aalborg, Aarhus, Esbjerg, and TΓΈnder regions. Often missing from off-the-shelf Danish ASR.
985 vetted
ZΓΌrich, Bern, Basel variants
All three major Alemannic urban variants plus rural fallbacks. High demand from finance + insurance NLU teams. Most utterance prompts span finance, healthcare, and consumer domains.
520 vetted
Northern + Coastal
Co-official with Spanish in Galicia. Coastal vs interior phonology differs meaningfully. Used by clients building bilingual ES/GL voice products.
210 vetted
Northern, Inari, Skolt
Three of the nine SΓ‘mi languages. Northern is most populous; Inari + Skolt are critically endangered. Recruited via community liaison networks across Finnmark + Inari + SevettijΓ€rvi.
285 vetted
West, North, Saterland
West Frisian (Netherlands), North Frisian (Schleswig-Holstein), Saterland Frisian (Lower Saxony). Used in regional government accessibility + cultural-heritage projects.
640 vetted
Northern + Southern
North Welsh + South Welsh β the dialect boundary is meaningful for ASR. Common downstream uses: government services, BBC Cymru content tooling, education.
155 vetted
Northern + Southern
Cismontano (north) and Oltramontano (south). Italo-Romance lineage. Among the smaller pools β typically scoped 3β6 weeks ahead for collection.
| Language | Region / Country | Vetted speakers | Dialects covered | Sample audio |
|---|---|---|---|---|
| Germanic 13 | ||||
| English | IE, MT (EU official) | 2,900 | Hiberno-English, Maltese English | scoping call |
| German | DE, AT, BE-DG, LU | 2,500 | Standard, Austrian, Low German, Swiss German (ZΓΌrich/Bern/Basel) | scoping call |
| Dutch | NL, BE-VL | 1,300 | Hollands, Flemish, Brabantian, Limburgish | scoping call |
| Danish | DK | 1,000 | Standard, Jutlandic (Western), SΓΈnderjysk, Bornholmian | scoping call |
| Swedish | SE, FI | 1,200 | Standard, SkΓ₯nsk, Gotlandic, Finland-Swedish | scoping call |
| Finnish | FI | 950 | Standard, Eastern, Western, Helsinki slang | scoping call |
| Norwegian (BokmΓ₯l) | NO (non-EU but EEA) | 1,850 | Eastern, Northern | scoping call |
| Norwegian (Nynorsk) | NO (non-EU but EEA) | 1,420 | Western, Inland | scoping call |
| Faroese | FO | 145 | TΓ³rshavn, SuΓ°uroy | scoping call |
| Frisian (West) | NL | 215 | Wood Frisian, Clay Frisian | scoping call |
| Frisian (North) | DE | 75 | Mooring, Fering | scoping call |
| Frisian (Saterland) | DE | 55 | Seelter | scoping call |
| Yiddish | EU-wide | 105 | Litvish, Galitzish | scoping call |
| Romance 15 | ||||
| French | FR, BE, LU | 2,800 | MΓ©tropolitain, Belgian, Walloon, Acadian | scoping call |
| Italian | IT, MT | 2,350 | Standard, Sicilian, Neapolitan, Venetian, Sardinian, Friulian | scoping call |
| Spanish (Castilian) | ES | 2,700 | Castilian, Andalusian, Murcian, Canarian | scoping call |
| Catalan | ES, AD | 850 | Central, Valencian, Balearic | scoping call |
| Galician | ES | 520 | Northern, Coastal | scoping call |
| Portuguese | PT | 1,150 | European Portuguese, Azorean, Madeiran | scoping call |
| Romanian | RO | 1,100 | Standard, Moldavian, Transylvanian | scoping call |
| Asturian | ES | 220 | Central, Western, Eastern | scoping call |
| Sardinian | IT | 235 | Logudorese, Campidanese | scoping call |
| Corsican | FR | 155 | Cismontano, Oltramontano | scoping call |
| Sicilian | IT | 410 | Palermitan, Catanese | scoping call |
| Friulian | IT | 155 | Central, Western, Carnico | scoping call |
| Romansh | CH (non-EU) | 115 | Sursilvan, Vallader, Puter, Rumantsch Grischun | scoping call |
| Occitan | FR, IT, ES | 230 | Gascon, Languedocien, ProvenΓ§al | scoping call |
| Walloon | BE | 105 | Central, Eastern | scoping call |
| Celtic 6 | ||||
| Irish Gaelic | IE (EU official) | 410 | Connacht, Munster, Ulster | scoping call |
| Welsh | UK (non-EU) | 640 | Northern, Southern | scoping call |
| Scottish Gaelic | UK (non-EU) | 220 | Hebridean, Highland | scoping call |
| Manx | IM (non-EU) | 55 | Revival cohort | scoping call |
| Breton | FR | 225 | Kerneveg, Leoneg, Tregerieg, Gwenedeg | scoping call |
| Cornish | UK (non-EU) | 55 | Kernewek Kemmyn revival | scoping call |
| Slavic 11 | ||||
| Polish | PL | 1,950 | Standard, Silesian, Kashubian-adjacent | scoping call |
| Czech | CZ | 1,150 | Standard, Moravian | scoping call |
| Slovak | SK | 850 | Standard, Eastern, Central | scoping call |
| Slovenian | SI | 620 | Standard, Prekmurje | scoping call |
| Bulgarian | BG | 1,100 | Eastern, Western | scoping call |
| Croatian | HR | 1,100 | Standard (Ε tokavian), Kajkavian, Chakavian | scoping call |
| Kashubian | PL | 145 | Northern, Southern | scoping call |
| Silesian | PL | 220 | Upper Silesian | scoping call |
| Sorbian (Upper) | DE | 95 | Bautzen / BudyΕ‘in | scoping call |
| Sorbian (Lower) | DE | 55 | Cottbus / ChΓ³Εebuz | scoping call |
| Rusyn | SK, PL, HU | 95 | Carpatho-Rusyn | scoping call |
| Uralic 9 | ||||
| Hungarian | HU | 1,450 | Standard, PalΓ³c, CsΓ‘ngΓ³ | scoping call |
| Estonian | EE | 850 | Standard, VΓ΅ro, South Estonian | scoping call |
| Finnish (Karelian-adjacent) | FI | β see Finnish row above | see Germanic-Finnish overlap | scoping call |
| SΓ‘mi (Northern) | NO, SE, FI | 145 | DavvisΓ‘megiella | scoping call |
| SΓ‘mi (Lule) | NO, SE | 55 | JulevsΓ‘megiella | scoping call |
| SΓ‘mi (Inari) | FI | 40 | AnarΓ’Ε‘kielΓ’ | scoping call |
| SÑmi (Skolt) | FI, RU border cohort | 40 | NuárttsÀÀʹmǩiáll | scoping call |
| Karelian | FI, RU border cohort | 55 | Livvi-Karelian, Northern Karelian | scoping call |
| VΓ΅ro | EE | 95 | South Estonian | scoping call |
| Baltic 2 | ||||
| Latvian | LV | 620 | Standard, Latgalian | scoping call |
| Lithuanian | LT | 620 | Standard, Ε½emaitian, AukΕ‘taitian | scoping call |
| Hellenic 1 | ||||
| Greek | GR, CY | 1,100 | Standard, Cypriot Greek | scoping call |
| Other / Isolates 2 | ||||
| Maltese | MT | 410 | Standard, regional | scoping call |
| Basque (Euskara) | ES, FR | 410 | Biscayan, Gipuzkoan, Upper Navarrese, Lapurdian | scoping call |
Counts refreshed quarterly. Last refresh: 2026-Q1. Live capacity may vary by 5-10% based on freelancer availability per project window. Vetted = ID-verified, audio-quality-tested, NDA-signed.
Audio samples: native-speaker recordings from Wikimedia Lingua Libre and Wikitongues, licensed CC-BY-SA 4.0. Corsican sample is TTS-generated pending real recording.
How it works
Four stages from brief to delivery. Every transition is logged for EU AI Act audit. Datasets ship with Article 10 documentation as standard.
01 Input
You arrive with a use case. We arrive with a list of languages, dialects, accents, demographic distribution, recording conditions, and acceptance criteria. We converge in 1–2 calls.
02 Process
1 0.85 is the industry-standard threshold for premium audio data. Measured continuously, reported in the QA dashboard. Methodology shared in the scoping call.
{
"dataset_id": "ypai-NO-nyn-2026Q2",
"language": "nob",
"dialect": "nynorsk-western",
"speakers": 142,
"hours": 218.7,
"kappa": 0.87,
"wer_baseline": 9.4,
"sample_rate_hz": 48000,
"bit_depth": 16,
"eu_ai_act_doc": "annex/article-10.pdf",
"license": "ypai-commercial-perpetual",
"delivery": "2026-04-28T14:30:00Z"
} 03 Output
You receive a versioned dataset bundle plus an Article 10 compliance manifest, a QA report, and per-file metadata. Everything needed to ship to procurement, legal, and your ML team at the same time.
PROOF
“We needed Nynorsk and Jutlandic Danish at production volume in 8 weeks. ypai delivered both with audit-ready documentation. Our WER on Western Nordic dialects dropped 12.4% after retraining.”
“We replaced 19% of agent-routed calls with self-service after retraining on ypai’s regional dialect data. Procurement signed off because the audit trail mapped directly to our DPIA. Six markets, six languages, one contract.”
“Statutory minority-language access used to mean a backlog of manually-transcribed citizen submissions. ypai’s Sámi and Frisian coverage let us automate intake without dropping accuracy below our 92% acceptance threshold. Audit-ready by default mattered as much as the coverage.”
Read the full case studies (under NDA — available on request)
Logos shown by category. And 200+ other EU enterprises. Specific references available under NDA in scoping call.
COMPLIANCE
Every dataset arrives with the documentation your procurement team will ask for and your auditor will verify. Built into our pipeline, not bolted on.
Download Article 10 documentation templateDatasets ship with Annex VI documentation.
Read moreAudio = special category data. Lawful basis explicit consent.
Read moreStorage, access, audit controls.
Read moreFrankfurt or Stockholm host. DPA template on request.
Read moreEU AI Act effective Aug 2026 · Our datasets are audit-ready today.
Delivery API
REST + signed-URL delivery. Datasets stream to your S3, GCS, Azure, or HTTPS endpoint of choice. EU data residency enforced per contract.
from ypai import Client
client = Client(api_key="ypai_sk_live_...")
dataset = client.datasets.get("ypai-NO-nyn-2026Q2")
manifest = dataset.download("./local/path", with_audit=True)
print(f"Downloaded {manifest.hours:.1f}h, audit doc at {manifest.audit_path}") curl https://api.ypai.ai/v1/datasets/ypai-NO-nyn-2026Q2 \
-H "Authorization: Bearer $YPAI_API_KEY" \
-H "X-Include-Audit: true" \
--output ypai-NO-nyn-2026Q2.tar.gz Every dataset is fetchable with a 15-min signed URL β stream straight to your S3, GCS, Azure, or HTTPS endpoint.
Frankfurt or Stockholm hosting, enforced per contract. SCC + DPA available. Speaker-consent ledger included.
GET /v1/datasets/{id}/audit returns the EU AI Act Article 10 documentation as JSON or PDF.