從 raise-a-bull 到 Hermes：一晚上換掉個人 AI 的工程決策

起點：raise-a-bull 開始燒錢

我有個自己寫的 personal AI bot 叫 raise-a-bull。它跑在 samantha-wsl（我家裡的 always-on Linux 機），用 LINE + Discord 接訊息，背後吃我的 Claude Max subscription（$20/mo）。功能完整：multi-channel、workspace 概念、11 個內建 skills、Gemini Flash 描述圖片、台灣電子發票 QR 解析。

用了快一年，最近發現一件事：它越用越燒。看 token 統計，每個 chat turn 都是滿 cache miss。

為什麼 cache 一直 miss

raise-a-bull 的核心是 Claude Code SDK 的 claude -p（headless 一次性執行模式）。每收到一個訊息：

spawn 新 claude -p process
把整個 system prompt + workspace context + chat history 送進去
Claude 回應
process 結束，記憶體釋放

問題出在 Anthropic prompt cache 的 5 分鐘 TTL。如果你 10 分鐘沒講話再講第二句，前面的 prompt 已經從 cache 退出了，這次得從頭 re-tokenize 全部 context。對 chatbot 場景這超致命 — bot 的對話頻率天然 sporadic。

我看著 token 帳單跟回應延遲，意識到 這不是寫法問題，是 architectural 問題。需要常駐 process 才能持續 hold 同一個 SDK client，讓 cache stay warm。

替代方案：NousResearch Hermes Agent

@NousResearch/hermes-agent 是個 Python LLM agent framework：

內建 MCP client（first-class）
Multi-platform gateway（Discord / Telegram / WhatsApp / Slack / LINE / etc）
自帶 skills system
提供 OpenAI-compat HTTP API gateway

最重要的：它是常駐 daemon process。同個 Anthropic SDK client 持續活著，prompt cache 不會被 process 死亡 evict。

我已經在 mojobot（夢酒館 bar 內部 AI 助手）用 mojo-hermes 服務員工版，運作穩。對 personal use case 一樣 work。

設計核心決策（brainstorm 過程整理）

1. 完全跨 fail domain 隔離

mojo-hermes 服務員工生產，敏感不能動。bot-personal-hermes 是我個人實驗環境，要能裝任何奇怪 skill / 試新 model / 跑 destructive command 而不影響員工。

→ 不同 stack、不同 docker network、不同 container、獨立 storage volume、獨立 model API key。

唯一共享：mojobot MCP source code（/opt/mcp/mojobot/server.py read-only mount）— 這樣兩個 Hermes 都能用同一份 mojobot 工具，但 personal-hermes 寫不壞它。

2. 跨 stack 通訊：Tailscale URL 而非 docker network

兩個方案：

方案	操作	缺點
α. cross-stack docker network	personal stack 加 `networks: [mojo-net (external)]`	耦合藏在 network 裡，且 mojo stack `down` 時 network 砍掉 personal 連不到
β. Tailscale URL	mojo outline 改 bind `100.77.154.72:3015`（30秒重啟），personal-hermes 用 `OUTLINE_API_URL=http://100.77.154.72:3015`	動 mojo stack 一行；但永久解耦

選 β。看似「動到生產」是違背原則，但實際上這個改動讓所有未來想 access outline 的 stack 都簡單（任何機器、Tailscale on 即可）。耦合明擺在 URL 上比藏在 network 裡好維護。

3. Model：MiniMax-M2.7 + MiniMax-VL-01 (vision aux)

Hermes 有個 first-class concept 叫 auxiliary models — 不同 task 走不同 model：

model:
  default: MiniMax-M2.7              # text 主力

auxiliary:
  vision:
    model: MiniMax-VL-01             # 收到圖自動切過去
    timeout: 30

主流程文字便宜跑 M2.7，圖片進來自動 route 到 VL-01。這比寫 skill 自己去呼叫 vision API 乾淨很多 — Hermes 自己處理 content-based routing。

4. UI：hermes-workspace（PWA）而非 Hermes 內建 dashboard

Hermes 內建 dashboard 是 11 個 admin pages 的 React SPA，不是 PWA（沒 manifest、沒 service worker）。

hermes-workspace 是社群第三方專案，是 PWA：可裝 iPhone 主畫面、離線可用、Monaco editor、PTY terminal、2000+ skill registry browse、8 themes、mobile-first 設計、Tailscale 內網手機可開。

對「power-user daily driver」全面勝出。

5. MVP pull mode：先不接 Discord

我的 use case：scheduled tasks (heartbeat / daily review / digest) + 圖片視覺 + Trello/Outline/mojobot 查詢。

最後一刻發現：hermes-workspace PWA 沒有 push notification。這代表 bot 排程跑完只能累積在 session log，要我自己主動開 PWA 看（pull mode）。要 push 訊息到手機需要 Discord/LINE channel。

決策：MVP 接受 pull mode。用一週看看「我會不會忘記去看 daily review」。如果會，再加 Discord adapter（phase 2）。否則維持 pull mode 更乾淨。

不要過度設計。

6. Memory：兩個人格，不共享

我 Claude Code automemory 在 ~/sync/pw-os/system/memory/（syncthing 跨機同步），裡面 32 條 user/feedback/project memories。

選擇題：bot-personal-hermes 要不要 mount 同 dir 共享記憶？

✅ 共享 = 兩邊都很懂我
✅ 不共享 = 兩個 genuine personality

選不共享。理由：

「兩個人格」是 feature — Claude Code 是 coding 副手，bot-personal-hermes 是 personal AI；distinct personality 可能比「兩邊都很懂我」有趣
沒有 frontmatter lint harness 之前共享一定會 corrupt — Hermes 寫的 memory 格式 100% 會飄掉（type 用錯、name 重複、description 太長），污染 Claude MEMORY.md index

開了 backlog card：未來想共享要先寫 lint harness（code 驗證，不是寄望 LLM 自律）。

一個 enabling refactor：mojobot MCP 的 PEP 723

原本 mojo-hermes 用 custom Dockerfile：

FROM nousresearch/hermes-agent:latest
USER root
RUN cd /opt/hermes && uv pip install fastmcp httpx

純粹為了在 Hermes venv 裡裝 mojobot MCP 用的兩個 dep（因為 mojobot MCP 是 volume mount 進來的，不會自帶 venv）。每次 container recreate 都要手動 re-install。

簡單 fix：在 services/mcp/server.py 頂端加 PEP 723 inline metadata：

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "fastmcp>=2.0",
#     "httpx>=0.27",
# ]
# ///

然後 Hermes config 把 mojobot MCP 的 command 從 /opt/hermes/.venv/bin/python 改成 uv run：

mcp_servers:
  mojobot:
    command: uv
    args: [run, /opt/mcp/mojobot/server.py]

uv run 看到 PEP 723 metadata 自動建 ephemeral env 裝 deps。第一次 ~5s（uv 下 deps 到 cache），之後 cache hit ~50ms。Custom Dockerfile 完全可以丟。 Vanilla nousresearch/hermes-agent:latest 直接用。

PR: mojobot#44

最終架構

┌────────────────── samantha-wsl ──────────────────────────────────┐
│                                                                   │
│  Stack: mojo (員工生產)              Stack: bot-personal-hermes   │
│  ─────────────────────────────       ───────────────────────────  │
│  ┌─ mojo-redis ─────────────┐                                     │
│  ├─ mojo-outline (Tailscale │                                     │
│  │  100.77.154.72:3015) ←───┼── Tailscale URL ─┐                 │
│  ├─ mojo-hermes (vanilla)   │                  │                  │
│  └─ mojo-openwebui          │                  │                  │
│     chat.mojokm.com         │                  │                  │
│  ┌──────────────────────────┘                  │                  │
│                                                ▼                  │
│                              ┌────────────────────────────────┐   │
│                              │ bot-personal-hermes            │   │
│                              │ (vanilla nousresearch/         │   │
│                              │  hermes-agent:latest)          │   │
│                              │                                │   │
│                              │ Model: MiniMax-M2.7            │   │
│                              │ Vision aux: MiniMax-VL-01      │   │
│                              │                                │   │
│                              │ MCPs:                          │   │
│                              │  • outline (Tailscale URL)     │   │
│                              │  • mojobot (volume mount, ro)  │   │
│                              │  • trello                      │   │
│                              │                                │   │
│                              │ Gateway: 100.77.154.72:8646    │   │
│                              └────────┬───────────────────────┘   │
│                                       │                           │
│                              ┌────────▼───────────────────────┐   │
│                              │ bot-personal-hermes-workspace  │   │
│                              │ (PWA UI)                       │   │
│                              │ http://100.77.154.72:9120      │   │
│                              └────────────────────────────────┘   │
└──────────────────────────────────────┬────────────────────────────┘
                                       │ Tailscale only
                                       ▼
                          Mac / iPhone PWA installed to home screen

部署完整流程

samantha-wsl 上：

# 1. 動 mojo stack（β plan）— 一次性
ssh samantha-wsl
cd ~/docker/mojo
sed -i 's|127.0.0.1:3015:3000|100.77.154.72:3015:3000|' docker-compose.yml
docker compose up -d outline   # 30 秒重啟

# 2. clone samantha-infra repo
cd ~/Github
git clone https://github.com/leepoweii/samantha-infra.git
cd samantha-infra/stacks/bot-personal-hermes

# 3. 寫 secrets
cp .env.example .env
nano .env   # 填 9 個 secrets — 看 .env.example 註解

# 4. pre-write hermes-data/config.yaml
mkdir -p hermes-data
# 直接抄 config.yaml.example 內容，把 <PLACEHOLDER> 換成 .env 真實值
nano hermes-data/config.yaml

# 5. 起
docker compose pull
docker compose up -d

完。兩個 container 跑起來，Hermes 自己讀 config.yaml 載入 3 個 MCPs。

5 個 smoke test 驗證

#	Test	怎麼測	結果
1	Gateway alive	`curl -H "Authorization: Bearer $KEY" http://100.77.154.72:8646/v1/models`	✅ 401 no-auth, 200 + model `hermes-agent`
2	outline MCP cross-stack	curl chat 問「列出 Outline 上 collections」	✅ 回 “Welcome” collection
3	mojobot MCP	curl chat 問「庫存最低 3 個料」	✅ 回真實庫存（安格仕橙味苦精/BIOES紅葡萄汁/BIOES覆盆莓汁）
4	trello MCP	curl chat 問「我的 boards」	⚠️ 回了 template boards 不是真實的 — workspace ID 設定 TBD
5	Vision aux	curl 上傳貓圖 via image_url	✅ 回「橘色虎斑貓」 — MiniMax-VL routing 確認

5 條中 4 ✅ 1 ⚠️。Trello workspace ID 是已知 followup（不影響其他功能）。

踩到的雷 / 學到的事

1. mojobot MCP env var 不是 `MOJOBOT_API_KEY`

我 spec 一開始猜是 MOJOBOT_API_KEY=xxx 單一 token。實際看 source code (services/mcp/client.py) 才發現是 PIN-based JWT auth：

env:
  MOJOBOT_URL: https://internal.mojokm.com
  MOJOBOT_BOT_ID: <employee_id UUID>
  MOJOBOT_BOT_PIN: <4-digit PIN>

MCP server 啟動 → 第一次 call 時用 BOT_ID + PIN POST /auth/login 換 JWT → cache JWT → 之後 call 帶 Bearer JWT → 401 自動 re-login。

JWT lifecycle 完全在 MCP server 內部處理，不用每次 call 重新 PIN。Lesson: 別猜 schema，去讀 source code。

2. Hermes 不支援 bind-mount config.yaml

我設計 spec 時想：把 hermes-config.yaml 用 :ro mount 進 /opt/data/profiles/samantha/config.yaml。讀完 hermes_cli/config.py 才發現 Hermes 認 HERMES_HOME/config.yaml（容器內 /opt/data/config.yaml）為單一 source of truth，不支援外部 bind mount + merge。

正解：把 config.yaml 寫進 hermes-data/ bind mount 那層（host 端可改 file，container 看到變更）。或跑 hermes setup wizard 讓 Hermes 自己寫。

3. mojo-hermes 的 vision 用 Gemini Flash，不是 MiniMax-VL

我 spec 寫 auxiliary.vision.model: MiniMax-VL-01。後來 SSH 進 mojo-hermes 看實際 /opt/data/config.yaml 才發現 mojo 已經用 Gemini 2.5 Flash 當 vision aux：

auxiliary:
  vision:
    base_url: https://generativelanguage.googleapis.com/v1beta/openai/
    model: gemini-2.5-flash
    api_key: AIzaSy...

而且 raise-a-bull 也用 Gemini Flash。我問 Powei 要不要 personal-hermes 也跟著統一 → 他選「不，故意 diverge 用 MiniMax-VL 試試看，反正是 personal 實驗環境」。

Lesson: 設計時假設「跟現有系統一致」是合理 default，但要主動 surface 給 user 確認。

4. hermes-workspace 的 env var 名要查 README

我 docker-compose.yml 用 WORKSPACE_PASSWORD env var。container 起來 refuse to start：

HOST is set to "0.0.0.0" but CLAUDE_PASSWORD is unset.

正確 var 名是 CLAUDE_PASSWORD（不是 WORKSPACE_PASSWORD）。也需要 COOKIE_SECURE=0（因為走 plain HTTP over Tailscale，Secure flag cookie 會被瀏覽器 drop）。

也是 trial-and-error 才知道。Lesson: 第三方 image 的 env var 慣例別猜，read README first。

5. Hermes 需要 ANTHROPIC_API_KEY env var

config.yaml 裡寫 model.api_key: sk-cp-... 不夠。Hermes runtime 還是抓 ANTHROPIC_API_KEY env var：

Internal server error: No Anthropic credentials found. 
Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY...

要在 docker-compose env_file: .env 裡也設：

ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
ANTHROPIC_API_KEY=<same as MINIMAX_API_KEY>

雙重設定（config.yaml + env var）才能跑起來。冗，但是 Hermes 內部行為。

第一次 login（手動，UI 自動化失敗）

我本來想用 chrome MCP 自動 login + 截圖做 tutorial GIF，但 hermes-workspace 的 React 表單對 automation 不友善：

form_input 設 value 但 React 不認（state 沒更新，submit button 維持 disabled）
computer.type 寫對了但 button 還沒 enable，按 Return 沒反應
反覆嘗試讓 chrome tab 卡住沒回應

最後 chrome MCP 卡住沒辦法繼續。承認 limit：

→ 第一次 login 手動做：

開 http://100.77.154.72:9120
貼 WORKSPACE_PASSWORD（從你 .env 抓）
按 Continue
進入 chat UI，按右上「+」開新 chat 試問「我有什麼 boards」測試 trello MCP

之後 cookie 在 browser 裡，PWA 加主畫面後一直保持登入。

完成後的日常使用

場景	怎麼做
早上想知道庫存	PWA 問「金米麥今天庫存最低的 5 樣是什麼」
想找 wiki 上的東西	PWA 問「Outline 上有沒有寫過 SOP for X」
Trello 整理	PWA 問「我 TODO 上最舊的卡是什麼」
路上拍菜單	PWA 上傳照片問「這個有什麼」
不在電腦前想 chat	iPhone 點 PWA icon，跟 Mac 同樣使用體驗

獨立人格、跨 fail domain 不影響員工 mojo-hermes、激進升級策略（試新 Hermes 版我先當白老鼠，員工版穩定後再追）。

下一步（forward-compat backlog cards）

Backup：hermes-data/ 含 sessions + skills + memory，加進 weekly-backup.sh
kuma monitor：兩條 monitor 加進既有 uptime-kuma（gateway alive + workspace UI）
Trello workspace ID：fix 那個 ⚠️ smoke test，研究 @delorenj/mcp-server-trello 的 workspace 設定
Memory bridge (long-term)：寫 frontmatter lint harness 後再考慮共享 Claude automemory
mojobot MCP → uvx package：等 API 穩了發 git-installable package，連 volume mount 都丟
CF Tunnel samantha.pwlee.xyz：出門想看 dashboard / 接 inbound webhook 時做
Discord adapter (phase 2)：用一週發現需要 push 才加

工程節奏記錄

整個過程由 Claude Code 主導 — brainstorm + spec + plan + 部署 + smoke + 寫這篇 blog，全部一個晚上完成。我做的事：回答 brainstorm 問題、選技術方向、提供 secrets、起床看 push notification。

Spec + plan + 所有 deployment artifacts 都在 leepoweii/samantha-infra（private repo）。

工程速度的 leverage 不在「AI 寫 code」，而在 「AI 把整個 brainstorm → 設計 → 實作 → 驗證 → 文件化的閉環跑完」。我只需要做架構級判斷。

關鍵 takeaways

架構問題用 architectural fix：raise-a-bull 的 cache miss 不是寫法 bug 是 process 模型 bug，換 framework 才解
跨 stack 用 URL 不用 docker network：耦合擺出來比藏起來好
Auxiliary model > skill-based vision：first-class 的 framework feature 比自己 reinventing 乾淨
PWA + pull mode 是合理 MVP：不要為了 push 馬上接 Discord，先跑一陣子再決定
兩個 AI 人格可以是 feature：Claude Code（coding 副手）vs Samantha（personal AI）獨立累積，比「兩邊都很懂我」有趣
設計時讀 source code，不要猜 schema：mojobot MCP env vars / hermes config 載入 / hermes-workspace env names 都是踩過才知道

常見問題

為什麼從 raise-a-bull 換成 Hermes？

raise-a-bull 用 Claude Code SDK (`claude -p`) — 每個訊息都 spawn 新 process，prompt cache 5 分鐘 TTL 內才 hit。bot 訊息 sporadic（隔 10 分鐘才講第二句）→ 每句都 cache miss → 慢且燒 token。Hermes 是常駐 process，同個 SDK client 持續活著，cache 維持得住。這是 architectural advantage，不是 raise-a-bull 寫法問題。

為什麼用 MiniMax 而不是 Anthropic 直連？

Anthropic Pro/Max subscription 不能正路接 backend agent (ToS 灰色 + ban 風險)。要用 Anthropic 就得另外付 API 錢 ~$30-100/mo。MiniMax Token Plan 已付（mojo 員工版用同一條 key），personal-hermes 邊際成本 0。視覺另設 MiniMax-VL-01 為 auxiliary model，圖片內容自動 route 過去。

為什麼是 hermes-workspace 而不是 Hermes 內建 dashboard？

Hermes 內建 dashboard 不是 PWA — 純 React SPA 沒 manifest 沒 service worker。hermes-workspace（社群第三方）是 PWA、mobile-first、可裝 iPhone 主畫面、有 Monaco editor + PTY terminal + skill registry browse 2000+ + 8 themes。對「power-user daily driver」場景全面勝出。

為什麼跨 stack 不直接共用 docker network 而用 Tailscale URL？

看起來「共用 mojo-net」比較簡單（personal stack 加 `networks: [mojo-net]` 就好），但那把耦合藏在 docker network 裡。用 Tailscale URL 反而把耦合明擺在 service URL 上、看得見、好 debug、好擴展。動 mojo stack 一行 port bind 從 `127.0.0.1:3015:3000` 改 `100.77.154.72:3015:3000`，30 秒 outline 重啟，永久解耦。

為什麼 MVP 不接 Discord/LINE，只走 PWA pull mode？

PWA chat 是「我主動跟 bot 講話」場景，這個 hermes-workspace 完美。bot 主動 push（heartbeat / daily review）才需要 Discord/LINE channel。MVP 先用 pull mode 跑一週，真覺得「我都忘記去看排程結果」再加 Discord adapter — 不要過度設計。

Powei Lee · 李柏緯

@pwlee.xyz

在金門用 AI 協作建造產品的實踐者。
Build in Public — 誠實記錄每個過程，包含失敗。

↗ GitHub ↗ Instagram ✉ Email