Key Takeaways
- Due diligence happens before the RFP, not after contract signature
- Vendors who cannot answer sovereignty, consent, and bias questions now cannot answer them when regulators ask
- EU AI Act Article 10 compliance requires vendor documentation to exist at the point of data acquisition, not retroactively
- Twelve questions map to four risk categories: compliance, data quality, sovereignty, and delivery
- A vendor's response time and specificity to these questions reveals more than any certification document
Procurement teams evaluating speech data vendors typically issue an RFP and assess responses based on capability claims. Word count, language coverage, and sample audio quality dominate the evaluation. Compliance documentation, sovereignty status, and bias examination evidence rarely appear in the shortlisting criteria until procurement discovers, post-contract, that the vendor cannot support regulatory requirements.
Due diligence is the structured inquiry that closes this gap. The twelve questions below map to four risk categories: data compliance, data quality, data sovereignty, and delivery. A vendor who can answer them in specific, documented terms before contract signature is a different category of supplier from one who cannot.
Why due diligence comes before the RFP
An RFP tests a vendor’s capability. Due diligence tests a vendor’s accountability.
A vendor may have collected high-quality audio with excellent transcription accuracy while maintaining inadequate consent records, no demographic tracking, and no bias analysis documentation. That vendor passes an RFP evaluation and fails an EU AI Act Article 10 audit.
The twelve questions below surface accountability gaps before you are contractually committed. Vendors who answer these questions quickly and specifically have invested in compliance infrastructure. Vendors who need weeks to respond or who provide generic policy language rather than corpus-specific answers have not.
Category 1: Data compliance
Question 1: Can you provide individual consent records for all contributors in the corpus you will deliver to us?
Article 10 compliance requires that your vendor can demonstrate individual consent, not aggregate terms-of-service acceptance. The answer should include a description of the consent framework, sample forms, and the procedure for handling contributor deletion requests under GDPR Article 17. A vendor who cannot produce consent records cannot satisfy this requirement retroactively after your AI system is in production.
Question 2: What legal basis applies to each component of the corpus?
For speech data involving EU residents, the lawful basis is typically Article 6(1)(a) (consent) or Article 6(1)(b) (contract). Biometric data, which includes voice recordings under GDPR Article 9, requires an additional explicit legal basis. Vendors who cannot articulate the GDPR basis for collection by corpus component are not tracking provenance at the level Article 10 requires.
Question 3: What is the geographic origin of contributors, and are any located outside the EEA?
If contributors are outside the EEA, the data transfer mechanism must be documented. Standard contractual clauses, adequacy decisions, or other GDPR Chapter V mechanisms apply. The vendor must be able to trace every contribution to a documented lawful transfer basis if that contribution originated outside the EEA.
Category 2: Data quality
Question 4: What is the demographic composition of this corpus by age group, gender, and regional origin?
EU AI Act Article 10 requires that training data be representative of the target user population. If the vendor cannot produce demographic breakdowns by these categories for the specific corpus they are delivering to you, they cannot demonstrate representativeness. A general description of how they design corpora is not a substitute for actual corpus-level data.
Question 5: Can you provide a written collection methodology document specific to this corpus?
The methodology document should describe recording conditions, contributor briefing protocols, quality acceptance criteria, and inter-annotator agreement scores for any annotation applied. Generic process descriptions that apply across all corpora do not satisfy the corpus-specific documentation Article 10 requires for high-risk AI system training data.
Question 6: What quality thresholds does a recording pass before inclusion in the delivered corpus?
Signal-to-noise ratio requirements, speaker diarization accuracy, transcription error rate ceilings, and recording environment controls are all relevant. A vendor should be able to describe these thresholds numerically and confirm that the delivered corpus was verified against them.
Question 7: Have you conducted a formal bias examination on this corpus?
Bias examination under Article 10(2)(f) is not a general methodology statement. It is a documented analysis of specific demographic groups, applying specific fairness metrics, with documented results and any mitigation steps taken. The examination must be specific to the corpus you are receiving, not to the vendor’s general bias analysis practices. Ask for the report.
Question 8: What is the inter-annotator agreement score for transcription on this corpus?
IAA scores quantify annotation consistency. A vendor who cannot provide this number for the corpus they are delivering cannot demonstrate that transcription quality is controlled. IAA below 0.85 for forced-alignment transcription indicates quality issues that will affect model training.
Category 3: Data sovereignty
Question 9: Is the vendor’s legal entity incorporated in the EEA, with no parent company or operational controller subject to foreign government data access laws?
This is the data sovereignty question. GDPR compliance does not equal sovereignty. A US-headquartered vendor with EU data centers is subject to the US CLOUD Act, which allows US courts to compel production of data stored anywhere in the world. Your data processing agreement cannot override a US federal court order. Ask for the legal name and country of incorporation of the entity that will control your data, and for the identity of any parent company.
Question 10: Does the vendor use any US-headquartered cloud infrastructure sub-processors?
Even if the vendor’s legal entity is EEA-incorporated, using US cloud providers as sub-processors may create indirect CLOUD Act exposure. Ask for the complete list of sub-processors and their countries of incorporation.
Question 11: Has the vendor or any parent entity ever received a foreign government compulsion order for customer data?
This question tests transparency. If the answer is yes, ask how the vendor responded and whether customers were notified. If the vendor refuses to answer or claims no legal basis for disclosure, evaluate that response as risk information.
Category 4: Delivery and SLA
Question 12: What are the vendor’s documented procedures for right-to-erasure requests under GDPR Article 17?
For a speech corpus with identified contributors, a right-to-erasure request from any contributor requires the vendor to identify and delete that contributor’s recordings from the delivered corpus and from any processing infrastructure. The vendor must have a documented SLA for responding to these requests. If your trained model was fine-tuned on a corpus and a contributor requests erasure, you need to understand whether retraining is required and what your vendor’s role is in supporting that process.
Using the responses
Document vendor responses to each question in writing before selecting a supplier. The documentation serves two purposes: it creates a record you can reference during contract negotiation, and it constitutes the beginning of your procurement audit trail for Article 10 compliance.
A vendor who answers all twelve questions specifically and quickly, providing documentation rather than assertions, is demonstrating that compliance infrastructure is embedded in their operations. That is the standard that enterprise AI procurement increasingly requires.
For more on what Article 10 requires from vendors specifically, see our EU AI Act Article 10 speech data vendor requirements guide. For the formal RFP process, see our speech data vendor RFP requirements guide.
Related Resources
- EU AI Act Article 10: What Speech Data Vendors Must Prove to Enterprise Buyers - Documentation requirements for Article 10 compliance at the vendor level
- Speech data vendor RFP requirements - Formal RFP structure and technical specifications for enterprise procurement
- Data residency vs sovereignty for EU speech data - Why GDPR compliance does not equal data sovereignty and what EEA-native means
- GDPR-compliant speech data collection in Europe - Lawful basis, consent documentation, and GDPR vendor checklist for voice data
- EU AI Act compliant training data
- Speech data consent framework
- Data processing agreement overview