Google develops ‘Translatotron’, an AI-based speech-to-speech translation system
Google has announced its first direct speech-to-speech translation system called “Translatotron” that can convert verbal communication from one language to another while maintaining the speaker’s voice and tempo.
“Translatotron” is based on a sequence-to-sequence network which takes source spectrograms — a visual representation of frequencies — as input and generates spectrograms of the translated content in the target language, Ye Jia and Ron Weiss, software engineers at Google Artificial Intelligence (AI) wrote in a blog post on Wednesday.
The model makes use of two other separately trained components — a neural vocoder that converts output spectrograms to time-domain waveforms and a speaker encoder that can be used to maintain the character of the source speaker’s voice in the synthesised translated speech.