We are thrilled to announce the release of the first publicly available Dutch Synthetic Speech Dataset. This dataset is crucial for researchers, developers, and innovators keen on developing speech synthesis for the Dutch language.
Our open-source dataset features authentic and synthetic Dutch and English speech. This is significant for the Dutch language, which has limited resources for research and development in language processing and speech synthesis. The dataset is designed to fuel innovation in Dutch language processing and speech synthesis and explore new possibilities for human-computer interaction.
Our dataset allows generative AI to be trained more efficiently in Dutch language processing. This innovation allows the creation of incredibly realistic synthetic Dutch voices that mimic human intonation and emotions, pushing the boundaries in the generative AI industry and unlocking new potentials in synthetic speech. One of the most exciting applications of this dataset is its role in training deepfake voice generators. This dataset can be used to train deepfake voice generators to create incredibly realistic synthetic Dutch voices. While for video content creators, this dataset opens up new opportunities to combine synthetic voices created through deepfake voice generators with lifelike visuals, leading to producing highly realistic videos in Dutch. Whether it’s for educational content, entertainment, or historical recreation, the ability to pair synthetic voices with visuals is a game-changer, pushing the boundaries of what’s possible in Dutch video content creation.
However, as synthetic voice quality becomes increasingly realistic, the importance of synthetic voice detection becomes increasingly crucial. Thus, our Dutch Synthetic Speech Dataset serves as a tool for creating synthetic voices and an essential resource for developing systems that can detect and differentiate between human and synthetic speech. This capability is essential in maintaining the authenticity and integrity of audio content, especially in an era where deepfake technology makes it easier to manipulate audio and mimic someone’s voice, leading to potential misinformation, fraud, and identity theft.
Our dataset, which includes authentic and synthetic Dutch and English speech, is an invaluable tool for training AI systems to detect subtle differences that distinguish synthetic voices from real ones. By analysing various speech attributes such as pitch, tone, and speech patterns, AI systems can learn to identify the inconsistencies often present in synthetic speech. Moreover, this dataset can be particularly useful in the context of AI face recognition systems. When combined with video, synthetic voice can be used to create deepfake videos that are incredibly realistic. By training AI systems with our dataset, they can become adept at detecting synthetic voices, which is a critical component in verifying the authenticity of video content.
You can access the dataset here: https://drive.google.com/drive/folders/1XquNiF94wd7xfcXI5TnBGn5o6MHHobYh?usp=sharing
We’d like to thank SIDN fonds for their invaluable support in making this possible.