For a couple of upcoming projects, I’ve been trying to find a way of making a Raspberry Pi take an input of a piece of text and vocalise it through a pair of connected speakers (so-called Speech Synthesis). There are a number of methods listed on the eLinux wiki page on the subject, however I found the suggested available packages produced rather robotic sounding results, and I was after something a bit more natural and pleasant sounding, rather than something to scare the bejeezus out of me every time it speaks. The most natural sounding offering is a hidden and unofficial API provided through the Google Translate service, which produces some very nice sounding audio, and is very accurate most of the time. Unfortunately, it’s limited to 100 characters at a time, which starts to be a problem when you want to read out large swathes of text.
There are a few scripts that I found (including this one from Dan Fountain) that offer an interface to this API, however the majority of them just split the input at the 100 character mark (or by the previous space to it), which leads to broken sounding sentences in some cases, where the pre-existing punctuation could be used. In order to get something slightly more natural sounding, I set about bodging together some Python, and came up with the following:
Please note: this script no longer works! Google made some changes to their TTS engine during July 2015 which meant this script would no longer work, as the translate_tts request would be redirected to a CAPTCHA page. There is an updated version of the script available in my SVN repository, and now at Github as well
# Created by Matt Dyson (mattdyson.org)
# Some inspiration taken from http://danfountain.com/2013/03/raspberry-pi-text-to-speech/
# Version 1.0 (12/07/14)
# Process some text input from our arguments, and then pass them to the Google translate engine
# for Text-To-Speech translation in nicely formatted chunks (the API cannot handle more than 100
# characters at a time).
# Splitting is done first by any punctuation (.,;:) and then by splitting by the MAX_LEN defined
# mpg123 is required for playing the resultant MP3 file that is returned by Google TTS
from subprocess import call
MAX_LEN = 100 # Maximum length of a segment to send to Google for TTS
LANGUAGE = "en" # Language to use with TTS - this won't do any translation, just the voice it's spoken with
fullMsg = ""
i = 1
# Read our system arguments and add them into a single string
fullMsg += sys.argv[i] + " "
# Split our full text by any available punctuation
parts = re.split("[\.\,\;\:]", fullMsg)
# The final list of parts to send to Google TTS
processedParts = 
while len(parts)>0: # While we have parts to process
part = parts.pop(0) # Get first entry from our list
# We need to do some cutting
cutAt = part.rfind(" ",0,MAX_LEN) # Find the last space within the bounds of our MAX_LEN
cut = part[:cutAt]
# We need to process the remainder of this part next
# Reverse our queue, add our remainder to the end, then reverse again
# No cutting needed
cut = part
cut = cut.strip() # Strip any whitespace
if cut is not "": # Make sure there's something left to read
# Add into our final list
for part in processedParts:
# Use mpg123 to play the resultant MP3 file from Google TTS
call(["mpg123","-q","http://translate.google.com/translate_tts?tl=%s&q=%s" % (LANGUAGE,part)])
This can also be downloaded from my projects repository at http://projects.mattdyson.org/projects/speech/googletts, where updated versions may be available. The package
mpg123 is required to play the resulting MP3 file that Google Translate returns. The easiest way to get this script installed will be with the following (run as root on your Raspberry Pi):
$ apt-get install mpg123
$ cd /usr/bin/
$ svn co http://projects.mattdyson.org/projects/speech speech
$ chmod +x speech/googletts
$ ln -s speech/googletts
$ googletts "Hello world, the installation of the text to speech script is now complete"
Unfortunately, if a clause of a sentence is longer than 100 characters there will still be an unwanted pause in the middle, as the script does not know where best to split the text, and if you’re using a lot of punctuation you might find the text takes a long time to read back. I’d be welcome to incorporate any improvements people may suggest!