OpenRouter Best Model for Mem0

Aiko · May 11, 2026, 3:09am

場景	首選模型	結論
最推薦的便宜 production default	`deepseek/deepseek-v4-flash`	價格低、1M context、支援 `response_format` / `structured_outputs` / tools，適合 Mem0 做 memory extraction、去重判斷與 entity / relation 類任務。
最便宜、仍可認真測 production 的 paid model	`arcee-ai/trinity-mini`	單價非常低，支援 structured outputs / tools；適合大量 memory ingestion，但上線前一定要用你的實際 memory prompt 跑驗收。
最穩的 mainstream cheap fallback	`google/gemini-3.1-flash-lite` 或 `openai/gpt-4.1-nano`	比上面更貴，但品牌與模型穩定性較好；`gpt-4.1-nano` 也接近 Mem0 self-hosted server 預設模型。
最便宜但不建議 production	`openrouter/free`	$0，但會路由到免費模型，且免費層有 request limit；只適合開發、測試、低風險場景。

OpenRouter 的價格是按各模型的 input / output token 公示費率計算，且其 pricing page 說模型目錄價格不加價；下面表格把 API 中的 per-token 價格換算成 USD / 1M tokens。

先釐清 Mem0 任務型態

Mem0 OSS 目前的新演算法重點是 single-pass ADD-only extraction：先取 top-10 related existing memories 作為 dedup context，再用一次 LLM call 抽出 distinct new facts，之後做 hash-based dedup、insert、entity extraction + linking。

另外，如果你說的「生成關係」是舊版 graph memory，Mem0 文件說 OSS 的 enable_graph / graph_store 已移除，改成內建 entity linking；舊 relations field 不再作為可查詢 graph relationship 暴露。
所以模型選擇上，重點不是「最強 reasoning」，而是：低成本、穩定 JSON / schema、長 context、精準抽取與 entity linking。

Mem0 文件也顯示可以覆寫 LLM config；OpenRouter 可透過 OpenAI provider 相容方式使用，並以 OPENROUTER_API_KEY 與 OpenRouter model id 指定模型。

候選模型總表

排名	OpenRouter model ID	價格 input / output，USD per 1M tokens	Context	結構化輸出能力	適合 Mem0 的 insight	建議
1	`deepseek/deepseek-v4-flash`	$0.14 / $0.28	1,048,576	`response_format` 、`structured_outputs` 、tools	價格非常低，context 很大，適合把 conversation + related memories 一起放進去做 extraction / dedup / entity linking。	首選 production default 。
2	`arcee-ai/trinity-mini`	$0.045 / $0.15	131,072	`response_format` 、`structured_outputs` 、tools	這是我會認真測的最低價 paid option；比多數 mainstream model 便宜很多。	最便宜 paid 候選，需實測準確率。
3	`openai/gpt-4.1-nano`	$0.10 / $0.40	1,047,576	`response_format` 、`structured_outputs` 、tools	Mem0 OSS 文件列出 self-hosted server default LLM 是 `gpt-4.1-nano-2025-04-14`；若你想接近 Mem0 預設路線，這是低風險選擇。	穩定 baseline 。
4	`qwen/qwen3-30b-a3b-instruct-2507`	$0.09 / $0.30	262,144	`response_format` 、`structured_outputs` 、tools	價格低、context 夠大；若你的 memory 內容有中英文混合或中文比例高，值得測。	中文 / 多語便宜首選之一。
5	`bytedance-seed/seed-2.0-mini`	$0.10 / $0.40	262,144	`response_format` 、`structured_outputs` 、tools	定位偏 latency-sensitive / high-concurrency / cost-sensitive，適合大量 memory add。	高併發便宜候選。
6	`google/gemini-3.1-flash-lite`	$0.25 / $1.50	1,048,576	`response_format` 、`structured_outputs` 、tools	價格不是最低，但 OpenRouter 描述它是 low-latency、high-volume workload 的高效率模型；適合你想要 mainstream 穩定度。	穩定 fallback / mainstream choice 。
7	`z-ai/glm-4.7-flash`	$0.06 / $0.40	202,752	`response_format` 、`structured_outputs` 、tools	input 很便宜，適合長輸入、短輸出的 memory extraction；但 default temperature 較高，建議手動壓低。	低價可測。
8	`ibm-granite/granite-4.1-8b`	$0.05 / $0.10	131,072	`response_format` 、`structured_outputs` 、tools	單價很低，適合簡單事實抽取；但 8B 級模型做 subtle dedup / relationship 可能需要更嚴格 prompt。	極低價 simple extraction 。
9	`mistralai/ministral-8b-2512`	$0.15 / $0.15	262,144	`response_format` 、`structured_outputs` 、tools	output 很便宜，適合每次輸出較多 memory JSON 的情境。	便宜穩定候選。
10	`mistralai/ministral-14b-2512`	$0.20 / $0.20	262,144	`response_format` 、`structured_outputs` 、tools	比 8B 稍強，仍維持低 output 成本；適合需要較完整 JSON 的 memory pipeline。	比 8B 更穩一點。
11	`stepfun/step-3.5-flash`	$0.10 / $0.30	262,144	`response_format` 、tools	價格不錯，但 API 顯示未列 `structured_outputs`；若你只要求 JSON mode 而非 strict schema，可測。	可測，但不是首選。
12	`bytedance-seed/seed-1.6-flash`	$0.075 / $0.30	262,144	`response_format` 、`structured_outputs` 、tools	很便宜；若 2.0 mini 的品質沒有明顯優勢，可拿來 A/B test。	低價替代。
13	`deepseek/deepseek-v3.2`	$0.252 / $0.378	131,072	`response_format` 、`structured_outputs` 、tools	output 價格低，適合需要較好 reasoning / agentic tool-use 的 memory pipeline。	DeepSeek fallback 。
14	`qwen/qwen3.6-35b-a3b`	$0.15 / $1.00	262,144	`response_format` 、`structured_outputs` 、tools	input 便宜，中文與多模態場景可測；output 價格高於 DeepSeek V4 Flash。	中文關係抽取可測。
15	`qwen/qwen3.6-flash`	$0.25 / $1.50	1,000,000	`response_format` 、`structured_outputs` 、tools	1M context，和 Gemini 3.1 Flash Lite 價格接近；若 Qwen 在你的中文 memory 表現更好，可選。	中文長 context fallback 。
16	`minimax/minimax-m2.1`	$0.29 / $0.95	196,608	`response_format` 、`structured_outputs` 、tools	OpenRouter 描述它偏 coding / agentic workflows；對 structured memory 也可能有用，但不是最低價。	品質候選，不是成本首選。
17	`mistralai/mistral-large-2512`	$0.50 / $1.50	262,144	`response_format` 、`structured_outputs` 、tools	比上面多數模型貴；適合你發現便宜模型在 dedup 或 relation extraction 出錯時作為升級。	hard-case fallback 。
18	`~openai/gpt-mini-latest`	$0.75 / $4.50	400,000	`response_format` 、`structured_outputs` 、tools	Mem0 library default 是 OpenAI `gpt-5-mini`；這個 OpenRouter alias 接近「GPT mini family」路線，但成本不低。	接近預設但不便宜。
19	`~anthropic/claude-haiku-latest`	$1.00 / $5.00	200,000	`response_format` 、`structured_outputs` 、tools	穩定但對 memory ingestion 來說偏貴；除非你特別信任 Claude 的抽取品質。	高價穩定 fallback 。
20	`openrouter/free`	$0 / $0	200,000	`response_format` 、`structured_outputs` 、tools	OpenRouter free router 會從免費模型中路由；免費層有 50 req/day、20 rpm，熱門免費模型也可能限流。	只建議 dev / test 。
21	`google/gemini-2.0-flash-lite-001`	$0.075 / $0.30	1,048,576	`response_format` 、`structured_outputs` 、tools	價格很漂亮、context 很大；但 OpenRouter API 顯示 expiration date 是 2026-06-01，現在不適合新 production 依賴。	不建議新部署。
22	`x-ai/grok-4.1-fast`	$0.20 / $0.50	2,000,000	`response_format` 、`structured_outputs` 、tools	2M context 很吸引人；但 API 顯示 expiration date 是 2026-05-15，除非確認續用，否則不建議當 default。	暫不建議 default 。
23	`qwen/qwen-2.5-7b-instruct`	$0.04 / $0.10	32,768	`response_format` 、tools	幾乎最低價，但 context 小、模型舊；適合很簡單的 memory extraction，不適合細緻去重或關係生成。	只做低風險測試。

最佳實務配置建議

用途	建議
單一 default model	用 `deepseek/deepseek-v4-flash` 。
成本壓到最低	先測 `arcee-ai/trinity-mini` ；若 dedup / relation 錯誤率太高，升到 `deepseek/deepseek-v4-flash` 。
想走 OpenAI / Mem0 預設附近	用 `openai/gpt-4.1-nano` ；若需要更高品質再測 `~openai/gpt-mini-latest` 。
中文 memory 比例高	先測 `qwen/qwen3-30b-a3b-instruct-2507` 、`qwen/qwen3.6-35b-a3b` 、`deepseek/deepseek-v4-flash` 。
fallback chain	`deepseek/deepseek-v4-flash` → `google/gemini-3.1-flash-lite` → `openai/gpt-4.1-nano`。OpenRouter routing / fallback 失敗時只收 successful model run 的費用。
參數	`temperature: 0` 或 `0.1`；限制 `max_tokens`；強制 JSON schema / `structured_outputs`；不要開 web search；reasoning 設 minimal / off，除非你發現關係抽取品質不足。
production 注意	儘量 pin explicit model id，不要只用 `latest` alias；OpenRouter 文件也建議可選 explicit model ID/version 以避免變更。