英会話練習のためのプログラムを作成してみる　その１（Node-RED、Python）

2026年2月11日 2026年2月11日

管理人

はじめに

　つい最近、国際学会に参加することがありました。英語の発表をするだけなら練習すればいいのですが、質疑応答は即座に英語で回答できるか不安だったので、英会話練習のためのプログラムを作成してみました。

　これまで試していた音声認識、LLMとの対話、音声合成などを組み合わせました。実際に試していると、咄嗟に言いたいことが英語で言えなくて難しかったです。

▼ちなみに英語発音の訛りを判定できるサイトでは、私の発音は100%日本人として判定されました。

https://start.boldvoice.com/accent-oracle

▼以前の記事はこちら

フローを作成する

　Node-REDでAI関連のツールを実行するにあたって、Pythonを利用しています。私が開発したpython-venvノードでは、Node-REDでPythonの仮想環境を利用して、パッケージをインストールしたり、コードを実行したりできます。

▼python-venvノードについては、度々紹介しています。

https://qiita.com/background/items/d2e05e8d85427761a609

https://flows.nodered.org/node/@background404/node-red-contrib-python-venv

　Node-REDのpython-venvノードを利用した音声翻訳のフローを過去に作成して公開していました。

▼こちらのページです。

https://github.com/404background/node-red-contrib-python-venv/discussions/48

▼以下のフローを作成していました。

　各ノードで録音、音声認識、翻訳、音声合成、再生の処理を行っていました。ノード毎に処理を分けているので、機能ごとにノードを変更すれば簡単に組み替えることができます。このフローをベースに作成していきます。

　Ollamaを利用したLLMとの対話により会話できるようにするのですが、Node-REDでのOllamaとのやり取りは検証済みです。個人的にはgemma3:4bが処理が速く、日本語の応答に違和感が無いという理由でよく使っています。

▼以下の記事で試していました。

　音声認識には以前と同じくFaster Whisperを用いました。

▼こちらの記事で試していました。以前は日本語でしたが、今回は英語が対象です。

　英語の音声合成も以前と同じくgTTSを使うことにしました。レスポンスが速いです。

▼以下の記事で試していました。

　これまで検証を行っていたときのフローを組み合わせて、英会話用のフローを作成しました。

▼こちらのフローです。

[{"id":"aaf1d6d449c94b50","type":"venv","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"","code":"import socket\nimport ollama\nimport threading\nimport sys\n\nHOST = '127.0.0.1'\nPORT = 5001\nMAX_HISTORY = 20\nchat_history = []\nserver_socket = None\nrunning = True\n\ndef chat_with_ollama(user_input):\n    global chat_history\n    \n    # 履歴の長さを制限\n    if len(chat_history) >= MAX_HISTORY:\n        chat_history.pop(0)\n\n    chat_history.append({'role': 'user', 'content': user_input})\n    \n    response = ollama.chat(model='gemma3:4b', messages=chat_history)\n    chat_reply = response['message']['content']\n    \n    chat_history.append({'role': 'assistant', 'content': chat_reply})\n    print(chat_history)\n    return chat_reply\n\ndef handle_client(client_socket):\n    \"\"\" クライアントごとの処理 \"\"\"\n    global running\n    with client_socket:\n        print(f\"Client connected: {client_socket.getpeername()}\")\n        while True:\n            try:\n                data = client_socket.recv(1024).decode().strip()\n                if not data:\n                    break\n                if data.lower() == \"bye\":\n                    print(\"Received 'bye', shutting down server...\")\n                    client_socket.sendall(\"Ollama Shutdown\".encode())\n                    running = False  # サーバーの実行を停止\n                    break\n\n                response = chat_with_ollama(data)\n                client_socket.sendall(response.encode())\n            except (ConnectionResetError, BrokenPipeError):\n                break\n        print(\"Client disconnected.\")\n\ndef server_thread():\n    \"\"\" サーバーをスレッドで実行 \"\"\"\n    global server_socket, running\n\n    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n    server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)\n    server_socket.bind((HOST, PORT))\n    server_socket.listen()\n    server_socket.settimeout(1.0)  # 1秒ごとに `Ctrl+C` チェック\n\n    print(f\"Listening on {HOST}:{PORT}...\")\n\n    try:\n        while running:\n            try:\n                client_socket, addr = server_socket.accept()\n                client_thread = threading.Thread(target=handle_client, args=(client_socket,))\n                client_thread.start()\n            except socket.timeout:\n                continue  # タイムアウトしてもループを継続\n    except Exception as e:\n        print(f\"Server error: {e}\")\n    finally:\n        server_socket.close()\n        print(\"Server shut down.\")\n\ndef main():\n    global running\n    server = threading.Thread(target=server_thread)\n    server.start()\n\n    try:\n        while running:\n            pass  # メインスレッドを維持して `Ctrl+C` を待つ\n    except KeyboardInterrupt:\n        print(\"\\nShutting down server...\")\n        running = False  # サーバーの実行を停止\n        server.join()  # スレッドを待機\n        sys.exit(0)\n\nif __name__ == \"__main__\":\n    main()\n","continuous":true,"x":700,"y":1120,"wires":[["bfe2ca92d58b4aab"]]},{"id":"ab187d6e8aa28dc5","type":"pip","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"ollama","arg":"ollama","action":"install","tail":false,"x":700,"y":1060,"wires":[["96f1feaa88df4420","aaf1d6d449c94b50"]]},{"id":"aba8c6f5265613f1","type":"inject","z":"3852c69e65b57289","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":560,"y":1060,"wires":[["ab187d6e8aa28dc5"]]},{"id":"96f1feaa88df4420","type":"debug","z":"3852c69e65b57289","name":"debug 462","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":860,"y":1060,"wires":[]},{"id":"772bbe230ffb89db","type":"inject","z":"3852c69e65b57289","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":560,"y":1120,"wires":[["aaf1d6d449c94b50"]]},{"id":"bfe2ca92d58b4aab","type":"debug","z":"3852c69e65b57289","name":"debug 463","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":860,"y":1120,"wires":[]},{"id":"a289d789c6353cb4","type":"comment","z":"3852c69e65b57289","name":"Ollama Server","info":"","x":560,"y":1020,"wires":[]},{"id":"7dac2b20f9756edb","type":"inject","z":"3852c69e65b57289","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"bye","payloadType":"str","x":680,"y":1320,"wires":[["cffc835148198023"]]},{"id":"e5b3b8a4b272df67","type":"inject","z":"3852c69e65b57289","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"こんにちは","payloadType":"str","x":670,"y":1240,"wires":[["cffc835148198023"]]},{"id":"cffc835148198023","type":"tcp request","z":"3852c69e65b57289","name":"","server":"127.0.0.1","port":"5001","out":"time","ret":"string","splitc":"0","newline":"","trim":false,"tls":"","x":860,"y":1240,"wires":[["3004d51474ae3ac7","37a414a63432014a"]]},{"id":"3004d51474ae3ac7","type":"debug","z":"3852c69e65b57289","name":"debug 464","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1060,"y":1240,"wires":[]},{"id":"6add9796c6edf4ff","type":"inject","z":"3852c69e65b57289","name":"Start","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":1060,"y":1060,"wires":[["20f2c93987dbd35f"]]},{"id":"75f195108c82622d","type":"debug","z":"3852c69e65b57289","name":"debug 4","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1360,"y":1060,"wires":[]},{"id":"20f2c93987dbd35f","type":"venv","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"Record","code":"import pyaudio\nimport numpy as np\nfrom faster_whisper import WhisperModel\n\n# ==============================\n# ユーザー設定\n# ==============================\n\nMIC_DEVICE_INDEX = 2   # ★★★ ここでマイクのIndexを指定 ★★★\nMODEL_SIZE = \"medium\" # base\n\n# 音声設定\nFORMAT = pyaudio.paInt16\nCHANNELS = 1\nRATE = 16000\nCHUNK = 1024\n\nSILENCE_LIMIT = 2.0            # 無音判定時間（秒）\nNOISE_ESTIMATION_TIME = 1.0    # 環境ノイズ測定時間（秒）\nNOISE_THRESHOLD_MULT = 3.0     # ノイズ倍率\n\n# ==============================\n# Whisperモデル初期化\n# ==============================\n\ntry:\n    model = WhisperModel(MODEL_SIZE, device=\"cuda\", compute_type=\"float16\")\n    print(\"[DEBUG] GPU (CUDA) モードで起動しました。\")\nexcept Exception as e:\n    print(f\"[DEBUG] GPU起動失敗 ({e}) → CPUモード\")\n    model = WhisperModel(MODEL_SIZE, device=\"cpu\", compute_type=\"int8\")\n\n# ==============================\n# Audio 初期化\n# ==============================\n\naudio = pyaudio.PyAudio()\n\nprint(f\"[DEBUG] 使用するマイクIndex: {MIC_DEVICE_INDEX}\")\ninfo = audio.get_device_info_by_index(MIC_DEVICE_INDEX)\nprint(f\"[DEBUG] マイク名: {info['name']}\")\n\nstream = audio.open(\n    format=FORMAT,\n    channels=CHANNELS,\n    rate=RATE,\n    input=True,\n    input_device_index=MIC_DEVICE_INDEX,\n    frames_per_buffer=CHUNK\n)\n\n# ==============================\n# 状態変数\n# ==============================\n\nSILENCE_THRESHOLD = None\nnoise_rms_values = []\nelapsed_noise_time = 0.0\n\nall_frames = []\nis_speaking = False\nsilent_duration = 0.0\n\nprint(\"[DEBUG] 環境ノイズ測定開始...\")\n\n# ==============================\n# メインループ\n# ==============================\n\ntry:\n    while True:\n        data = stream.read(CHUNK, exception_on_overflow=False)\n        audio_chunk = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0\n        rms = np.sqrt(np.mean(audio_chunk ** 2))\n        chunk_time = CHUNK / RATE\n\n        # ------------------------------\n        # ノイズフロア測定フェーズ\n        # ------------------------------\n        if SILENCE_THRESHOLD is None:\n            noise_rms_values.append(rms)\n            elapsed_noise_time += chunk_time\n\n            if elapsed_noise_time >= NOISE_ESTIMATION_TIME:\n                noise_floor = float(np.mean(noise_rms_values))\n                SILENCE_THRESHOLD = noise_floor * NOISE_THRESHOLD_MULT\n                print(f\"[DEBUG] ノイズフロア: {noise_floor:.6f}\")\n                print(f\"[DEBUG] ノイズ閾値設定: {SILENCE_THRESHOLD:.6f}\")\n                print(\"[DEBUG] 音声認識待機中...\")\n            continue\n\n        # ------------------------------\n        # 発話検出\n        # ------------------------------\n        if rms > SILENCE_THRESHOLD:\n            if not is_speaking:\n                is_speaking = True\n                # print(\"[DEBUG] 録音開始\")\n\n            all_frames.append(data)\n            silent_duration = 0.0\n\n        else:\n            if is_speaking:\n                all_frames.append(data)\n                silent_duration += chunk_time\n\n                if silent_duration >= SILENCE_LIMIT:\n                    # ------------------------------\n                    # Whisper 推論\n                    # ------------------------------\n                    audio_data = np.frombuffer(\n                        b''.join(all_frames), dtype=np.int16\n                    ).astype(np.float32) / 32768.0\n\n                    segments, info = model.transcribe(\n                        audio_data,\n                        beam_size=5,\n                        language=\"en\",\n                        vad_filter=True\n                    )\n\n                    text = \"\".join(seg.text for seg in segments).strip()\n                    if text:\n                        print(text)\n\n                    # リセット\n                    all_frames = []\n                    is_speaking = False\n                    silent_duration = 0.0\n\nexcept KeyboardInterrupt:\n    print(\"\\n[DEBUG] 終了します\")\n\nfinally:\n    stream.stop_stream()\n    stream.close()\n    audio.terminate()\n","continuous":true,"x":1210,"y":1060,"wires":[["75f195108c82622d","9dedaffe03177259"]]},{"id":"b70eebee1b0f8f26","type":"comment","z":"3852c69e65b57289","name":"en","info":"","x":1040,"y":1020,"wires":[]},{"id":"3ee9600cc276eca6","type":"inject","z":"3852c69e65b57289","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"Hello","payloadType":"str","x":680,"y":1280,"wires":[["cffc835148198023"]]},{"id":"c1a45a7468958355","type":"inject","z":"3852c69e65b57289","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"Hello","payloadType":"str","x":740,"y":1360,"wires":[["3a9f8f31fa8361ff"]]},{"id":"3a9f8f31fa8361ff","type":"template","z":"3852c69e65b57289","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"{{payload}} --output voice_en.mp3 --lang en","output":"str","x":890,"y":1360,"wires":[["1d08214cc1cf0138","37a414a63432014a"]]},{"id":"1d08214cc1cf0138","type":"debug","z":"3852c69e65b57289","name":"debug 5","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":890,"y":1420,"wires":[]},{"id":"5cde7328496c37d1","type":"venv","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"","code":"from gtts import gTTS\n\n# 音声化したいテキスト\ntext = msg['payload']\n\n# gTTSの設定\ntts = gTTS(text=text, lang='en')\n\n# 指定されたファイル名で保存\nfilename = \"voice_en.mp3\"\ntts.save(filename)\n","continuous":false,"x":1100,"y":1360,"wires":[["a4fb54aa5a71adda","541fec172c0f1263"]]},{"id":"a4fb54aa5a71adda","type":"debug","z":"3852c69e65b57289","name":"debug 6","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1270,"y":1320,"wires":[]},{"id":"9dedaffe03177259","type":"switch","z":"3852c69e65b57289","name":"","property":"payload","propertyType":"msg","rules":[{"t":"cont","v":"[DEBUG]","vt":"str"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":1100,"y":1120,"wires":[[],["a49256832d2fbc9f","a4e5925b81f71973"]]},{"id":"a49256832d2fbc9f","type":"debug","z":"3852c69e65b57289","name":"debug 7","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1270,"y":1140,"wires":[]},{"id":"a4e5925b81f71973","type":"template","z":"3852c69e65b57289","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"You are system chat assistant.\nDo not use special character.\nAnswer short conversations.\n\nUser:{{payload}}","output":"str","x":1050,"y":1160,"wires":[["cffc835148198023"]]},{"id":"c8a20dcb7dae1a9a","type":"debug","z":"3852c69e65b57289","name":"debug 8","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1470,"y":1360,"wires":[]},{"id":"541fec172c0f1263","type":"venv","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"Constant Speed","code":"import os\nimport sys\nimport warnings\n\n# UserWarningを抑制（pygameのインポート前に実行）\nwarnings.filterwarnings(\"ignore\", category=UserWarning, module=\"pygame.pkgdata\")\n\nimport pygame\n\npygame.mixer.init()\npygame.mixer.music.load('voice_en.mp3')\npygame.mixer.music.play()\n\nwhile pygame.mixer.music.get_busy():\n    pygame.time.Clock().tick(10)","continuous":true,"x":1280,"y":1360,"wires":[["c8a20dcb7dae1a9a"]]},{"id":"feebc8ed2dab31df","type":"inject","z":"3852c69e65b57289","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":550,"y":960,"wires":[["116bc5d1c1f2eac8"]]},{"id":"4a532c2b85883bfc","type":"debug","z":"3852c69e65b57289","name":"debug 30","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1030,"y":960,"wires":[]},{"id":"98464f0e6daace38","type":"debug","z":"3852c69e65b57289","name":"debug 31","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1000,"y":920,"wires":[]},{"id":"91c2295a08dc2eae","type":"inject","z":"3852c69e65b57289","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":550,"y":920,"wires":[["186c305dc3f80c02"]]},{"id":"ebc6fc3958c3902e","type":"comment","z":"3852c69e65b57289","name":"Install Packages","info":"","x":560,"y":860,"wires":[]},{"id":"e314fd8f1fa97209","type":"debug","z":"3852c69e65b57289","name":"debug 32","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":890,"y":1000,"wires":[]},{"id":"116bc5d1c1f2eac8","type":"pip","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"googletrans","arg":"googletrans","action":"install","tail":false,"x":710,"y":960,"wires":[["e314fd8f1fa97209","ce5e275ce1a82ce3"]]},{"id":"186c305dc3f80c02","type":"pip","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"pyaudio faster-whisper","arg":"pyaudio faster-whisper","action":"install","tail":false,"x":740,"y":920,"wires":[["98464f0e6daace38"]]},{"id":"ce5e275ce1a82ce3","type":"pip","z":"3852c69e65b57289","venvconfig":"9f6da0060d06a240","name":"Audio","arg":"pyaudio numpy pydub gTTS pygame","action":"install","tail":false,"x":870,"y":960,"wires":[["4a532c2b85883bfc"]]},{"id":"37a414a63432014a","type":"junction","z":"3852c69e65b57289","x":1010,"y":1360,"wires":[["5cde7328496c37d1"]]},{"id":"9f6da0060d06a240","type":"venv-config","venvname":"pyenv","version":"default"},{"id":"8919d94f916ad5dd","type":"global-config","env":[],"modules":{"@background404/node-red-contrib-python-venv":"0.5.6"}}]

　Ollamaの処理用サーバーを立てて、TCP通信でメッセージを受け取ります。音声認識の結果をTCP通信で送信し、LLMの処理結果を音声合成して、再生するような流れです。

　なお、音声録音と音声認識の処理を担うノードは、何度も試して調整を繰り返しました。発話が終了したことを閾値から判定して、処理を行うようにしています。以前は実行してから数秒間録音するような処理にしていたのですが、今回はずっと起動した状態で、音声のストリーム処理を行うようにしています。