Using gTTS with Python (Text-to-Speech, Node-RED)
Introduction
In this article, I used gTTS (Google Text-to-Speech) with Python.
I have used VoiceVox before, but I was looking for something that could also speak English and generate voice quickly in a local environment. It was easy to use and generated quickly, so I would like to use for future use.
▼Previous articles are here.
Related Information
▼GitHub Repository
https://github.com/pndurette/gTTS
▼I found Examples.
https://gtts.readthedocs.io/en/latest/module.html#examples
▼PyPI Page
Running with Python
I created a Python virtual environment and installed gTTS. The environment used was Windows 11 with Python 3.10.
▼Commands to build the environment are here
python -m venv pyenv
cd pyenv
.\Scripts\activate
pip install gTTS
▼For more information on creating a Python virtual environment, see the following article; you can also create a Python virtual environment by specifying a Python version.
I tried a sample program from the Examples page of gTTS.
▼I ran the following code. It looks like you can select the language as well.
from gtts import gTTS
tts = gTTS('hello', lang='en')
tts.save('hello.mp3')
▼The following audio file was generated.
▼An executable file named gtts-cli.exe, which will be used later, is added to the Scripts folder.
Using the python-venv Node
I utilized the python-venv node in Node-RED to execute Python in a virtual environment and tested gTTS.
▼The Node-RED Advent Calendar introduces the history of node development.
https://qiita.com/background/items/d2e05e8d85427761a609
▼You can also read more about node development in the following articles.
In October, I added the venv-exec node, which allows executing files within a virtual environment. A sample flow using gTTS is available in the GitHub repository.
▼First, I tried the following flow.
[{"id":"3eaf61a6234b295e","type":"pip","z":"22eb2b8f4786695c","venvconfig":"36c2cf6f351fdc6e","name":"","arg":"gTTS","action":"install","tail":false,"x":630,"y":3900,"wires":[["adc2bd3e14623eac"]]},{"id":"0329d8a408d93a42","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":480,"y":3900,"wires":[["3eaf61a6234b295e"]]},{"id":"adc2bd3e14623eac","type":"debug","z":"22eb2b8f4786695c","name":"debug 150","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":790,"y":3900,"wires":[]},{"id":"ad3e228c832881fd","type":"venv-exec","z":"22eb2b8f4786695c","name":"","venvconfig":"36c2cf6f351fdc6e","mode":"execute","executable":"gtts-cli.exe","arguments":"'hello' --output hello.mp3","x":630,"y":3960,"wires":[["d2b90b2d03a26252"]]},{"id":"faf05570b1bc51af","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":480,"y":3960,"wires":[["ad3e228c832881fd"]]},{"id":"d2b90b2d03a26252","type":"debug","z":"22eb2b8f4786695c","name":"debug 155","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":790,"y":3960,"wires":[]},{"id":"36c2cf6f351fdc6e","type":"venv-config","venvname":"AI","version":"3.8"}]
After installing gTTS with the pip node, the venv-exec node was used to run the gtts-cli.exe.
▼The venv-exec node runs gtts-cli.exe.
▼The following audio file was generated.
The venv-exec node takes the value of msg.payload and executes it if the Argument field is blank. I tried using a template node and making the argument a variable.
If I just put a template node between the inject node and the venv-exec node, I was getting the following error.
▼If the string contained spaces, as in Hello, World! I was getting an error as if they were separated by spaces.
▼When checked in the debug node, the string is enclosed in double quotes.
Remove double quotation in change node. Finally, the following flow can be executed.
▼Here is the overall flow
[{"id":"5def8c7caba7d4da","type":"venv-exec","z":"22eb2b8f4786695c","name":"gTTS","venvconfig":"36c2cf6f351fdc6e","mode":"execute","executable":"gtts-cli.exe","arguments":"","x":860,"y":4160,"wires":[["2211e3a3be80c988"]]},{"id":"44c0cb2499027b61","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"Hello, World!","payloadType":"str","x":450,"y":4100,"wires":[["68c9da00af3ff5d4"]]},{"id":"2211e3a3be80c988","type":"debug","z":"22eb2b8f4786695c","name":"debug 324","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1020,"y":4160,"wires":[]},{"id":"68c9da00af3ff5d4","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"'{{payload}}' --output voice.mp3","output":"str","x":600,"y":4100,"wires":[["e63d63cb8887f452","1b16618e69c8cb72"]]},{"id":"ed0419f2c389e0c2","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"Nice to meet you.","payloadType":"str","x":460,"y":4180,"wires":[["68c9da00af3ff5d4"]]},{"id":"89f3ba3345fa9899","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"'{{payload}}' --output voice_ja.mp3 --lang ja","output":"str","x":620,"y":4220,"wires":[["5def8c7caba7d4da"]]},{"id":"12d9b5c47a80a876","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"こんにちは","payloadType":"str","x":440,"y":4220,"wires":[["89f3ba3345fa9899"]]},{"id":"7be349c8c215c65b","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"はじめまして","payloadType":"str","x":450,"y":4260,"wires":[["89f3ba3345fa9899"]]},{"id":"e63d63cb8887f452","type":"debug","z":"22eb2b8f4786695c","name":"debug 325","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":770,"y":4060,"wires":[]},{"id":"1b16618e69c8cb72","type":"change","z":"22eb2b8f4786695c","name":"","rules":[{"t":"change","p":"payload","pt":"msg","from":"\"","fromt":"str","to":"","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":800,"y":4100,"wires":[["5def8c7caba7d4da"]]},{"id":"36c2cf6f351fdc6e","type":"venv-config","venvname":"AI","version":"3.8"}]
▼Only the text is received in the template node and the required arguments are specified.
▼In the case of Japanese, Japanese is specified in the argument.
▼Double quotation is removed in the change node.
I executed it.
▼It is generated at about the speed of a video. It may be that the text is short, but it takes less than a second.
▼The following audio files were generated.
Conclusion
The Japanese language seems to be spoken slowly. After this, I could execute Python code in the venv node to play the generated audio file and change the playback speed.
Since the template node is designed to receive only text, it can be used in combination with other nodes. For example, it could be used in combination with a dashboard node to generate an audio file of the input text.