Pythonで検索してローカルLLMで要約してみる(Ollama、Node-RED)

はじめに

 今回は情報収集を自動化したかったので、Pythonで検索するプログラムを試してみました。

 いつも通りNode-REDで実行できるようにして、他のノードと組み合わせることで、ローカルLLMの処理にもつなげています。

▼以前の記事はこちら

Pythonで論文を収集する(arXiv、Node-RED)

はじめに  今回はarXiv APIを使って、Pythonで論文を収集してみました。  普段は論文を検索するときにGoogle Scholarを使っていたのですが、プログラムで自動化したかっ…

Pythonでテキストを翻訳する(Googletrans、Node-RED)

はじめに  今回はPythonでGoogletransを利用した翻訳を試してみました。  書いてはいないのですが、これまで翻訳するのにdeep-translatorも使ったことがあります。他に…

検索ワードで検索した結果を取得する

 Pythonの実行は、私が開発したpython-venvノードを利用しています。Pythonの仮想環境を作成して、Node-REDのノードとしてコードを実行できます。

▼年末に開発の変遷を書きました。

https://qiita.com/background/items/d2e05e8d85427761a609

 ChatGPTにコードを書いてもらって、実行するためのフローを作成しました。いくつかの方法が提示されていたのですが、duckduckgo_searchというパッケージを利用したもので実行できました。

▼フローはこちら

[{"id":"d8a5bb0727beae2a","type":"pip","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"","arg":"duckduckgo-search","action":"install","tail":false,"x":1550,"y":280,"wires":[["1799d5abf8328a4f"]]},{"id":"460611e4d97c1167","type":"inject","z":"22eb2b8f4786695c","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":1410,"y":280,"wires":[["d8a5bb0727beae2a"]]},{"id":"1799d5abf8328a4f","type":"debug","z":"22eb2b8f4786695c","name":"debug 434","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1710,"y":280,"wires":[]},{"id":"f9d45ebf97fdcc05","type":"venv","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"","code":"import json\nimport os\nimport time\nfrom duckduckgo_search import DDGS\n\ndef duckduckgo_search(query):\n    \"\"\"DuckDuckGoで検索\"\"\"\n    results = []\n    with DDGS() as ddgs:\n        for result in ddgs.text(query, max_results=5):\n            results.append({\n                \"title\": result[\"title\"],\n                \"url\": result[\"href\"],\n                \"content\": result[\"body\"]\n            })\n            time.sleep(1)\n\n    return results\n\ndef collect_information(command):\n    \"\"\"自然言語の命令を受け取り、DuckDuckGoで情報を収集してJSONファイルに保存\"\"\"\n    results = duckduckgo_search(command)\n\n    collected_data = {\n        \"command\": command,\n        \"results\": results\n    }\n\n    filename = f\"data.json\"\n\n    # JSONとして保存\n    with open(filename, \"w\", encoding=\"utf-8\") as f:\n        json.dump(collected_data, f, indent=4, ensure_ascii=False)\n\n    print(f\"{filename}\")\n\n    return filename\n\n# コマンドを入力\nuser_command = msg['payload']\ncollect_information(user_command)\n","continuous":true,"x":1550,"y":360,"wires":[["8ab8688f5429b0c8"]]},{"id":"909b52bae01cbd91","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"ROSとは何ですか?","payloadType":"str","x":1370,"y":360,"wires":[["f9d45ebf97fdcc05"]]},{"id":"8ab8688f5429b0c8","type":"debug","z":"22eb2b8f4786695c","name":"debug 435","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1710,"y":360,"wires":[]},{"id":"763ecd66d84d5cb7","type":"comment","z":"22eb2b8f4786695c","name":"DuckDuckGo","info":"","x":1410,"y":220,"wires":[]},{"id":"015784e9e3e0310a","type":"venv-config","venvname":"AI","version":"3.10"}]

 パッケージをインストールし、実行してみました。

▼rosという単語で検索すると、検索結果が保存されていました。

▼ROSとは何ですか?という自然言語での質問に対しても、検索結果が出力されました。日本語で聞いたからなのか、日本語の記事の検索結果が多いです。

▼何回か実行していると、Ratelimitのエラーが起きていました。

 時間を置くと検索できるようになっていたのですが、このエラーの確実な回避方法は分かっていません。解決策が分かれば、追記しようかと思います。

URLの情報を抽出する

 検索結果に情報元のURLがあったので、そのURLの情報を抽出してみました。これもプログラムはChatGPTに書いてもらいました。

▼フローはこちら

[{"id":"5fd33fd2543212cd","type":"pip","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"","arg":"requests beautifulsoup4 html2text","action":"install","tail":false,"x":2090,"y":280,"wires":[["33cf763f5da32f69"]]},{"id":"a948a8bac57fad47","type":"inject","z":"22eb2b8f4786695c","name":"","props":[],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","x":1950,"y":280,"wires":[["5fd33fd2543212cd"]]},{"id":"33cf763f5da32f69","type":"debug","z":"22eb2b8f4786695c","name":"debug 436","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":2250,"y":280,"wires":[]},{"id":"929be7a9bdb6fbf0","type":"venv","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"","code":"import requests\nfrom bs4 import BeautifulSoup\nimport html2text\n\ndef fetch_url_content(url):\n    \"\"\"指定されたURLからHTMLを取得して、タイトル、見出し、本文を抽出\"\"\"\n    # URLのコンテンツを取得\n    response = requests.get(url)\n    if response.status_code != 200:\n        raise Exception(f\"Failed to fetch URL: {url}, status code: {response.status_code}\")\n\n    # BeautifulSoupでHTMLを解析\n    soup = BeautifulSoup(response.text, 'html.parser')\n\n    # 不要なタグを除去(広告やスクリプト、スタイルタグなど)\n    for tag in soup(['script', 'style', 'header', 'footer', 'nav', 'aside', 'form', 'input', 'button']):\n        tag.decompose()\n\n    # タイトルを取得\n    title = soup.title.get_text() if soup.title else \"No Title\"\n\n    # 見出しタグ(h1, h2, h3, ...)を取得\n    headings = []\n    for level in range(1, 7):  # h1, h2, ..., h6\n        headings += [h.get_text(strip=True) for h in soup.find_all(f'h{level}')]\n\n    # 本文(pタグやarticleタグ、sectionタグなど)を取得\n    paragraphs = []\n    content_tags = ['article', 'main', 'section', 'p']\n    \n    content = []\n    for tag in content_tags:\n        content += [element.get_text(separator=\"\\n\", strip=True) for element in soup.find_all(tag)]\n\n    # <a>タグ内のURLはそのまま出力に保持する\n    links = [a['href'] for a in soup.find_all('a', href=True)]\n    \n    # 重複を避けるためにセットで管理\n    content = list(set(content))  # 重複する内容を削除\n\n    # 行ごとにテキストを結合\n    formatted_text = \"\\n\\n\".join(content)\n\n    # html2textを使ってさらに整形(オプション)\n    readable_text = html2text.html2text(formatted_text)\n\n    # 結果を整理して戻す\n    result = f\"Title: {title}\\n\\n\"\n\n    if headings:\n        result += \"Headings:\\n\" + \"\\n\".join(headings) + \"\\n\\n\"\n\n    result += \"Content:\\n\" + readable_text\n\n    # URLをそのまま追加\n    if links:\n        result += \"\\n\\nLinks:\\n\" + \"\\n\".join(links)\n\n    return result\n\ndef save_to_file(content, filename=\"output.txt\"):\n    \"\"\"取得したテキストをファイルに保存\"\"\"\n    with open(filename, 'w', encoding='utf-8') as f:\n        f.write(content)\n\n    print(f\"{filename}\")\n\n# 実行例\nurl = msg['payload']\ncontent = fetch_url_content(url)\nsave_to_file(content)\n","continuous":true,"x":2090,"y":360,"wires":[["5243e38590f60e87"]]},{"id":"e8176f7b45fb6c6e","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"https://404background.com/program/esp32c3-6/","payloadType":"str","x":1950,"y":360,"wires":[["929be7a9bdb6fbf0"]]},{"id":"5243e38590f60e87","type":"debug","z":"22eb2b8f4786695c","name":"debug 437","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":2250,"y":360,"wires":[]},{"id":"bcc189344e12d2c4","type":"comment","z":"22eb2b8f4786695c","name":"URL","info":"","x":1930,"y":220,"wires":[]},{"id":"015784e9e3e0310a","type":"venv-config","venvname":"AI","version":"3.10"}]

▼このサイトの記事を対象にしてみました。

XIAO ESP32C3を使ってみる その6(DUALSHOCK 4との通信、Node-RED)

はじめに  今回はDUALSHOCK 4でXIAO ESP32C3を用いた小型ロボットを操作してみました。  以前調べていたときに、DUALSHOCK 4とXIAO ESP32C3はBluetoothの規格が違うので…

▼以下のように取得できました。

 タイトルや記事の内容、記事に含まれているURLを抽出することができていました。

ローカルLLMを利用する

 これまでも利用したことのあるOllamaノードを使って、検索結果の入力と出力にローカルLLMを利用してみました。

▼こちらの記事でも利用したことがあります。

Ollamaを使ってみる その1(Gemma2、Node-RED)

はじめに  今回はローカル環境でLLMを利用できるOllamaを使ってみました。様々な言語モデルをインストールして、文章を生成することができます。  これまで音声の文字起…

▼全体のフローはこちら

[{"id":"f9d45ebf97fdcc05","type":"venv","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"","code":"import json\nimport os\nimport time\nfrom duckduckgo_search import DDGS\n\ndef duckduckgo_search(query):\n    \"\"\"DuckDuckGoで検索\"\"\"\n    results = []\n    with DDGS() as ddgs:\n        for result in ddgs.text(query, max_results=5):\n            results.append({\n                \"title\": result[\"title\"],\n                \"url\": result[\"href\"],\n                \"content\": result[\"body\"]\n            })\n            time.sleep(1)\n\n    return results\n\ndef collect_information(command):\n    \"\"\"自然言語の命令を受け取り、DuckDuckGoで情報を収集してJSONファイルに保存\"\"\"\n    results = duckduckgo_search(command)\n\n    collected_data = {\n        \"command\": command,\n        \"results\": results\n    }\n\n    filename = f\"data.json\"\n\n    # JSONとして保存\n    with open(filename, \"w\", encoding=\"utf-8\") as f:\n        json.dump(collected_data, f, indent=4, ensure_ascii=False)\n\n    print(f\"{filename}\")\n\n    return filename\n\n# コマンドを入力\nuser_command = msg['payload']\ncollect_information(user_command)\n","continuous":true,"x":1750,"y":680,"wires":[["8df6f60cdf131317"]]},{"id":"b677330fdf7a781f","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"{\n    \"model\": \"llama3.2:3b\",\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"{{payload}}\"\n        }\n    ]\n}","output":"json","x":1700,"y":620,"wires":[["f1a3fff94e8ca7f6"]]},{"id":"8b8966839246b073","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"命令をもとに、Webで検索するためのワードを考えてください。\nあなたの回答をもとにプログラムで検索するので、そのワードだけ答えてください。\n\n命令:{{payload}}","output":"str","x":1540,"y":620,"wires":[["b677330fdf7a781f"]]},{"id":"f1a3fff94e8ca7f6","type":"ollama-chat","z":"22eb2b8f4786695c","name":"Chat","server":"","model":"","modelType":"str","messages":"","messagesType":"msg","format":"","stream":false,"keepAlive":"","keepAliveType":"str","tools":"","options":"","x":1850,"y":620,"wires":[["c6131e4553e5be27"]]},{"id":"c6131e4553e5be27","type":"change","z":"22eb2b8f4786695c","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"payload.message.content","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":2020,"y":620,"wires":[["f8fed00677b951a2","7808af3273d5acc9"]]},{"id":"e289912f42633e7e","type":"inject","z":"22eb2b8f4786695c","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"ROSとは何ですか?","payloadType":"str","x":1510,"y":560,"wires":[["a26ee4017359fea0"]]},{"id":"f8fed00677b951a2","type":"debug","z":"22eb2b8f4786695c","name":"debug 438","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":2210,"y":620,"wires":[]},{"id":"9d063052a8959a74","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"{\n    \"model\": \"llama3.2:3b\",\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"{{payload}}\"\n        }\n    ]\n}","output":"json","x":1700,"y":740,"wires":[["0ced2912fbb33769"]]},{"id":"9f6412eb269068c8","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"他のプログラムで質問内容をもとに検索しました。\nあなたの知識を踏まえて要約し、回答してください。\nあなたの回答文を音声に変換して再生するので、改行を入れず短く回答してください。\n\n質問内容:{{flow.question}}\n検索結果:{{payload}}","output":"str","x":1540,"y":740,"wires":[["9d063052a8959a74"]]},{"id":"0ced2912fbb33769","type":"ollama-chat","z":"22eb2b8f4786695c","name":"Chat","server":"","model":"","modelType":"str","messages":"","messagesType":"msg","format":"","stream":false,"keepAlive":"","keepAliveType":"str","tools":"","options":"","x":1850,"y":740,"wires":[["d39d0a6847e35ac6"]]},{"id":"d39d0a6847e35ac6","type":"change","z":"22eb2b8f4786695c","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"payload.message.content","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":2020,"y":740,"wires":[["e39e70aeb8969fbd","86665d76c26027dd"]]},{"id":"e39e70aeb8969fbd","type":"debug","z":"22eb2b8f4786695c","name":"debug 439","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":2210,"y":740,"wires":[]},{"id":"8df6f60cdf131317","type":"file in","z":"22eb2b8f4786695c","name":"","filename":"payload","filenameType":"msg","format":"utf8","chunk":false,"sendError":false,"encoding":"none","allProps":false,"x":1900,"y":680,"wires":[["9f6412eb269068c8"]]},{"id":"f4cbb6c043953de3","type":"change","z":"22eb2b8f4786695c","name":"","rules":[{"t":"set","p":"question","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1790,"y":560,"wires":[[]]},{"id":"cf8b5d6b47f9c9aa","type":"venv-exec","z":"22eb2b8f4786695c","name":"","venvconfig":"015784e9e3e0310a","mode":"execute","executable":"gtts-cli.exe","arguments":"","x":1850,"y":800,"wires":[["6cda5a40f86de916"]]},{"id":"86665d76c26027dd","type":"template","z":"22eb2b8f4786695c","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"\"{{payload}}\" --output voice.mp3 --lang ja","output":"str","x":1700,"y":800,"wires":[["cf8b5d6b47f9c9aa"]]},{"id":"6cda5a40f86de916","type":"venv","z":"22eb2b8f4786695c","venvconfig":"015784e9e3e0310a","name":"Speed Change","code":"import pyaudio\nimport numpy as np\nfrom pydub import AudioSegment\n\n# MP3ファイルの読み込み\nfile_path = 'voice.mp3'\naudio = AudioSegment.from_mp3(file_path)\n\n# 再生速度を変更\naudio_speed = 1.4\naudio = audio.speedup(playback_speed=audio_speed)\n\n# 音声データをnumpy配列に変換\nsamples = np.array(audio.get_array_of_samples())\n\n# サンプリングレートを取得\nframerate = audio.frame_rate\n\n# pyaudioで音声再生\np = pyaudio.PyAudio()\nstream = p.open(format=pyaudio.paInt16, channels=1, rate=framerate, output=True)\n\n# 音声データを再生\nstream.write(samples.tobytes())\nstream.stop_stream()\nstream.close()\np.terminate()\n","continuous":true,"x":2020,"y":800,"wires":[["6887802bf362bec7"]]},{"id":"6887802bf362bec7","type":"debug","z":"22eb2b8f4786695c","name":"debug 440","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":2210,"y":800,"wires":[]},{"id":"7808af3273d5acc9","type":"delay","z":"22eb2b8f4786695c","name":"","pauseType":"rate","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"10","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":1560,"y":680,"wires":[["f9d45ebf97fdcc05"]]},{"id":"d5ed7a966d03556c","type":"comment","z":"22eb2b8f4786695c","name":"Search","info":"","x":1450,"y":500,"wires":[]},{"id":"a26ee4017359fea0","type":"junction","z":"22eb2b8f4786695c","x":1640,"y":560,"wires":[["f4cbb6c043953de3","8b8966839246b073"]]},{"id":"015784e9e3e0310a","type":"venv-config","venvname":"AI","version":"3.10"}]

▼最後の検索結果を音声で再生するために、gTTSを利用しています。

PythonでgTTSを使ってみる(音声合成、Node-RED)

はじめに  今回はPythonでgTTS(Google Text-to-Speech)を使ってみました。  以前VoiceVoxも使ったことがあるのですが、英語も話すことができて、ローカル環境での音声の…

▼LLMのモデルはllama3.2:3bを利用しています。

▼自然言語から検索ワードを考えるのにもローカルLLMを利用してみました。

▼検索後、質問内容と検索結果から要約するように指定しました。

 実行してみました。

▼以下のような回答が返ってきて、音声が再生されました。

 ROSの説明としては良さそうな回答です。

▼検索結果の内容はこちら

{
    "command": "ROS(Robot Operating System)\n\nまたは\n ROS(Reactive Object-Oriented Software)",
    "results": [
        {
            "title": "ROS - Robot Operating System - ROS: Home",
            "url": "https://www.ros.org/",
            "content": "ROS - Robot Operating System. The Robot Operating System (ROS) is a set of software libraries and tools that help you build robot applications. From drivers to state-of-the-art algorithms, and with powerful developer tools, ROS has what you need for your next robotics project. And it's all open source."
        },
        {
            "title": "Why ROS? - Robot Operating System",
            "url": "https://www.ros.org/blog/why-ros/",
            "content": "ROS (Robot Operating System) is an open source software development kit for robotics applications. ... Moreover, ROS isn't exclusive, you don't need to choose between ROS or some other software stack; ROS easily integrates with your existing software to bring its tools to your problem. Multi-domain. ROS is ready for use across a wide array ..."
        },
        {
            "title": "PDF The Robot Operating System - GitHub Pages",
            "url": "https://stanfordasl.github.io/PoRA-I/aa274a_aut2122/pdfs/notes/lecture2.pdf",
            "content": "This chapter introduces the fundamentals of the Robot Operating System (ROS)1,2, a popular framework for creating robot software. Unlike what its 1 L. Joseph. Robot Operating System ... Nodes are the basic building block of ROS that enables object-oriented robot software development. Each robot component is developed as an individual ..."
        },
        {
            "title": "Robot Operating System (ROS): Working, Uses, and Benefits - Spiceworks",
            "url": "https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-robot-operating-system/",
            "content": "The robot operating system (ROS) is defined as a flexible and powerful framework designed for robotics software development. ROS Does not function as a standalone operating system but as a middleware, leveraging conventional operating systems such as Linux and furnishing developers with a suite of libraries and tools to craft sophisticated and resilient robot applications."
        },
        {
            "title": "Introduction to ROS (Robot Operating System) - GeeksforGeeks",
            "url": "https://www.geeksforgeeks.org/introduction-to-ros-robot-operating-system/",
            "content": "Robot Operating System or simply ROS is a framework which is used by hundreds of Companies and techies of various fields all across the globe in the field of Robotics and Automation. It provides a painless entry point for nonprofessionals in the field of programming Robots. So first of all What is a Robot ? A robot is any system that can perceive the environment that is its surroundings, take ..."
        }
    ]
}

▼生成された音声ファイルはこちら

 検索ワードを抽出しなくても、自然言語のまま検索できそうです。

▼dashboardノードと組み合わせてみました。

▼簡単な入出力の画面になっています。

▼検索エンジンで検索したときの結果と似たような感じです。

 なお、ローカルLLMの出力は4秒ぐらいで、検索の方が速度を制限しているのもあって遅かったです。

 今回は利用していませんが、さらに詳しい情報が必要であれば、URLの情報を抽出すると良さそうです。

最後に

 よく考えたら検索しなくても、ローカルLLMがある程度の知識を持っています。LLMが知らないであろう情報を検索するのに利用できそうだなと思っています。

 個人名や組織名で検索して試してみたのですが、自然言語から検索ワードをローカルLLMで考えるのは苦手なようでした。JSON形式のデータから要約するのはよくできていました。

▼今回はinjectノードのmsg.payloadに検索文を入れていましたが、音声ファイルからテキストを抽出するSpeech to Text系のソフトウェアを併用できそうです。

Whisperを使ってみる(音声認識、OpenAI、Python)

はじめに  今回はOpenAIのWhisperを使ってみました。  OpenAIのサービスはAPIキーを使って有料で利用するイメージがあったのですが、ソースコードはMIT Licenseで公開さ…

Faster Whisperを使ってみる(GPUでの実行、Python、Node-RED)

はじめに  今回はFaster Whisperを利用して文字起こしをしてみました。  Open AIのWhisperによる文字起こしよりも高速ということで試したことがあったのですが、以前はC…

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です