How many vendors should I score with this framework?

The scorecard is designed for shortlist evaluation after initial capability screening. Apply it to 3 to 5 vendors who have passed a basic capability filter. Applying the full scorecard to 10 or more vendors wastes procurement resources and the detailed documentation requests may strain vendor relationships for vendors you would not seriously consider. Initial capability screening eliminates vendors who cannot cover your target languages or lack EU deployment experience before the scorecard stage.

Can I adjust the category weights for my specific deployment context?

Yes. The weights in this framework assume an EU enterprise high-risk AI deployment requiring EU AI Act Article 10 compliance. If your deployment is not classified as high-risk under the AI Act, compliance readiness weight can be reduced and data quality weight increased. If your deployment is in a single language with strong commercial dataset coverage, language coverage weight can be reduced and documentation standards weight increased. Document your weight adjustments and the reasoning before evaluation begins.

What is the disqualifying threshold for the overall scorecard?

A total score below 60 out of 100 indicates the vendor cannot support a production EU enterprise deployment. A score between 60 and 75 indicates the vendor may be usable for specific limited applications but has material gaps that create deployment risk. A score above 75 indicates the vendor can be considered for contract negotiation. No vendor should proceed to contract with any disqualification criterion triggered, regardless of total score.

Speech Data Vendor Scorecard: Evaluation Framework

Procurement teams evaluating speech data vendors typically compare vendors on a combination of price, audio hour volume, and language coverage claims. This approach produces selection decisions based on vendor assertions rather than vendor evidence. A structured scorecard replaces assertion-based evaluation with documentation-based scoring: vendors score only on what they can demonstrate, not on what they claim.

The framework below assigns 100 points across five categories, weighted to reflect the deployment consequences of gaps in each category for EU enterprise AI systems requiring EU AI Act Article 10 compliance.

Category 1: Data Quality (20 points)

Data quality is scored on verifiable metrics from the specific corpus being delivered, not on the vendor’s general quality processes.

Inter-annotator agreement score (8 points). Request IAA scores for transcription on the specific corpus being evaluated. IAA measures annotation consistency: how much independent annotators agree on the correct transcription of each utterance. A vendor who cannot provide this score for the delivered corpus has not measured annotation quality at the level that production training data requires. Scoring: 8 points for IAA of 0.90 or above; 6 points for IAA of 0.85 to 0.89; 3 points for IAA of 0.80 to 0.84; 0 points for IAA below 0.80 or not available.

QA process and rejection rate (6 points). Request the vendor’s acceptance criteria for audio inclusion and their rejection rate for the corpus component being delivered. A rejection rate of 5 to 15% indicates the vendor is applying meaningful quality gates. A rejection rate below 2% suggests quality gates are minimal or not applied consistently. Scoring: 6 points for documented numerical criteria with 5 to 15% rejection rate; 4 points for documented criteria with rejection rate outside the target range; 2 points for documented criteria without rejection rate data; 0 points for no documented quality acceptance criteria.

Sample audio review (6 points). Request 100 representative utterances from the corpus with their transcriptions. Review for transcription accuracy, audio quality, and demographic representativeness. Procurement teams that skip sample review cannot verify vendor claims about corpus characteristics. Scoring: 6 points for no transcription errors in sample with audio quality consistent with stated conditions and demographic balance evident; 4 points for minor transcription errors or quality inconsistencies; 2 points for significant transcription errors or quality concerns; 0 points for vendor refusing to provide a sample.

Category 2: Compliance Readiness (25 points)

Compliance scoring uses the EU AI Act Article 10 documentation requirements as the reference standard.

Individual consent records (10 points). Can the vendor produce a sample individual consent record for a contributor in the corpus? The consent record should identify the contributor, the date of consent, the scope of use consented to, and the right-to-erasure procedure. Aggregate terms-of-service acceptance is not an individual consent record. Scoring: 10 points for a sample consent record provided with all required elements; 7 points for consent records that exist but are missing some required elements; 3 points for aggregated consent rather than individual records; 0 points for no consent records or vendor refusal to disclose the consent framework.

GDPR legal basis documentation (8 points). Can the vendor articulate the specific GDPR legal basis for each component of the corpus? For voice data, Article 9(2) applies (biometric data). Scoring: 8 points for a specific Article basis stated per corpus component with documentation; 5 points for a general GDPR compliance statement without component-level detail; 2 points for the vendor referencing GDPR compliance without specifying applicable articles; 0 points for the vendor being unable to articulate the GDPR legal basis.

Bias examination report (7 points). Has the vendor conducted a formal bias examination specific to this corpus? The examination should document specific demographic groups evaluated, fairness metrics applied, findings, and any mitigation steps. A general bias policy statement is not a bias examination report. Scoring: 7 points for a corpus-specific report with demographic groups, metrics, findings, and mitigations documented; 5 points for a corpus-specific analysis without complete documentation; 2 points for a general bias policy without corpus-specific analysis; 0 points for no bias examination or vendor refusal to provide one.

Category 3: Language and Demographic Coverage (25 points)

Coverage scoring evaluates match between the vendor’s actual corpus characteristics and the deployment’s target population.

Target language coverage depth (10 points). Does the vendor have documented collection experience in each of your target languages, with dialect coverage beyond standard broadcast speech? Scoring: 10 points for all target languages documented with dialect coverage data; 7 points for all target languages present but dialect coverage incomplete; 4 points for some target languages well-covered and others minimal; 0 points for one or more target languages absent from the vendor’s documented collection experience.

Demographic composition match (8 points). Can the vendor provide demographic breakdowns for the corpus by age group, gender, and regional origin? Does the composition match your deployment target population? Scoring: 8 points for a complete demographic breakdown with composition aligned to the deployment target; 6 points for a demographic breakdown available but alignment requires supplemental collection; 3 points for partial demographic data available; 0 points for no demographic breakdown available.

Acoustic condition coverage (7 points). Does the corpus include audio collected under the acoustic conditions relevant to the deployment (telephony, in-vehicle, far-field, mobile)? Scoring: 7 points for acoustic conditions documented and aligned with deployment conditions; 5 points for relevant conditions present but not fully specified; 2 points for primarily clean studio audio without deployment-relevant acoustic conditions; 0 points for acoustic conditions not documented.

Category 4: Documentation Standards (20 points)

Documentation is evaluated against EU AI Act Article 10’s requirement for collection methodology documentation, preprocessing documentation, and data lineage.

Collection methodology document (8 points). Request a collection methodology document specific to the corpus being evaluated. The document should describe contributor recruitment criteria, recording protocols, quality acceptance criteria, and annotation procedures. A general vendor methodology document that applies across all corpora is not corpus-specific documentation. Scoring: 8 points for a corpus-specific methodology document with all required sections; 5 points for a corpus-specific document missing some sections; 2 points for a general vendor methodology document only; 0 points for no written methodology documentation.

Data lineage and preprocessing logs (7 points). Can the vendor provide documentation of all preprocessing and transformation steps applied to the corpus? This includes normalization, segmentation, noise reduction, and annotation post-processing. Scoring: 7 points for a complete preprocessing log with steps and parameters documented; 5 points for partial preprocessing documentation; 2 points for preprocessing acknowledged but not documented; 0 points for no preprocessing documentation.

Right-to-erasure SLA (5 points). Does the vendor have a documented SLA for responding to GDPR Article 17 right-to-erasure requests from contributors? For a corpus with identified contributors, this SLA governs how quickly the vendor can identify and remove a contributor’s recordings. Scoring: 5 points for a documented SLA with specific response time and process; 3 points for an erasure process acknowledged but not documented with SLA; 0 points for no documented erasure process.

Category 5: Delivery and SLA (10 points)

Format specification compliance (5 points). Can the vendor deliver the corpus in the format specified in the RFP without requiring post-delivery format conversion? Format work is an uncontracted integration cost. Scoring: 5 points for format specification accepted and documented in contract; 3 points for format partially compliant with minor conversion required; 0 points for format conversion required.

Delivery timeline commitment (5 points). Has the vendor provided a milestone-based delivery schedule with specific dates, intermediate deliveries, and acceptance criteria for each milestone? Scoring: 5 points for a detailed milestone schedule with intermediate deliveries and acceptance gates; 3 points for an overall delivery date without milestones; 0 points for a timeline not specified or contingent on unspecified factors.

Disqualification criteria

The following criteria result in removal from evaluation regardless of scorecard total:

Vendor cannot produce individual consent records for contributors (not aggregate terms-of-service acceptance)
Vendor’s legal entity is incorporated outside the EEA and has no documented transfer mechanism for contributor data to the buyer’s jurisdiction
Vendor cannot provide a sample audio batch with transcriptions for pre-contract review
Vendor’s parent company or controlling entity is subject to US CLOUD Act or equivalent foreign government data access law and the vendor cannot document how this risk is mitigated
Total bias examination section score is 0 (no examination exists for the corpus)

A vendor who triggers any disqualification criterion is not evaluable under this framework regardless of performance in other categories.

For the pre-contract questions this scorecard is derived from, see our speech data vendor due diligence guide. For the RFP structure that precedes scorecard evaluation, see our speech data vendor RFP requirements guide.

Speech data vendor due diligence: 12 questions - Pre-contract questions mapped to the same four risk categories
EU AI Act Article 10: What Speech Data Vendors Must Prove to Enterprise Buyers - Documentation requirements that define the compliance readiness category
Custom speech corpus TCO vs off-the-shelf datasets - TCO analysis that precedes vendor selection
EU speech data sovereignty: why GDPR is not enough - Why the disqualification criteria include sovereignty checks
Speech data overview
EU AI Act compliant training data
Data processing agreement overview

Speech Data Vendor Scorecard: Evaluation Framework

Key Takeaways

Category 1: Data Quality (20 points)

Category 2: Compliance Readiness (25 points)

Category 3: Language and Demographic Coverage (25 points)

Category 4: Documentation Standards (20 points)

Category 5: Delivery and SLA (10 points)

Disqualification criteria

Frequently Asked Questions

Score YPAI Against Your Procurement Criteria

More from Data Engineering

AI Data Annotation Services: Labelbox vs Appen vs Scale AI

AI Data Annotation Services: Comparing Providers

AI Training Data: The Complete Enterprise Guide

Speech Data Vendor RFP: Requirements Framework

Speech Data Vendor SLA Requirements for ASR

Speech Data Vendor Scorecard: Evaluation Framework

Key Takeaways

Category 1: Data Quality (20 points)

Category 2: Compliance Readiness (25 points)

Category 3: Language and Demographic Coverage (25 points)

Category 4: Documentation Standards (20 points)

Category 5: Delivery and SLA (10 points)

Disqualification criteria

Related Resources

Frequently Asked Questions

Score YPAI Against Your Procurement Criteria

More from Data Engineering

AI Data Annotation Services: Labelbox vs Appen vs Scale AI

AI Data Annotation Services: Comparing Providers

AI Training Data: The Complete Enterprise Guide

Speech Data Vendor RFP: Requirements Framework

Speech Data Vendor SLA Requirements for ASR