核心理念:数据驱动的 Persona
Core Philosophy: Data-Driven Persona
市面上的 persona chatbot 大多靠手写 prompt:"你是一个友善的助手,说话风格活泼…"。这种方式有一个本质问题——你怎么定义"活泼"?
Most persona chatbots rely on handwritten prompts: "You are a friendly assistant with a lively style..." This approach has a fundamental problem — how do you define "lively"?
Aime 的做法不一样。它不描述风格,它测量风格:
Aime takes a different approach. It doesn't describe style — it measures it:
具体来说:不说"你说话很短",而是说"你的消息平均 14.2 个字,44.4% 在 10 字以内"。不说"你偶尔用 emoji",而是说"18.2% 的消息包含 emoji,最常用的是 😂😭🤮🙄😏"。
Specifically: instead of saying "you write short messages", we say "your messages average 14.2 characters, 44.4% under 10 characters". Instead of "you occasionally use emoji", we say "18.2% of messages contain emoji, most used: 😂😭🤮🙄😏".
训练 Pipeline:从微信到 Prompt
Training Pipeline: From WeChat to Prompt
整个流程不涉及任何模型训练。五个脚本,跑一遍就好。
The entire process involves zero model training. Five scripts, one run.
Step 1:解析微信记录
Step 1: Parsing WeChat History
微信聊天记录通过 WeChatExporter 导出为 HTML。解析面临几个挑战:
WeChat history is exported as HTML via WeChatExporter. Parsing faces several challenges:
- 分页加载:首页约 1000 条消息,剩余在 JS 文件中(
msg-1.js,msg-2.js…) - Emoji 标签:微信 emoji 是
<img class="wxemoji">标签,需要转成 Unicode - 连续消息合并:同一个人 2 分钟内的多条消息合并为一条(模拟微信多行发送)
- 上下文对:提取 (context, response) 对——每条回复配最多 5 条前文
- Paginated loading: First page has ~1000 messages, the rest are in JS files (
msg-1.js,msg-2.js...) - Emoji tags: WeChat emojis are
<img class="wxemoji">tags that need conversion to Unicode - Consecutive message merging: Multiple messages from the same person within 2 minutes are merged into one (simulating WeChat multi-line sending)
- Context pairs: Extract (context, response) pairs — each reply gets up to 5 preceding messages
Step 2:统计风格分析
Step 2: Statistical Style Analysis
这是整个系统的灵魂。下一节详细展开。
This is the soul of the entire system. Detailed in the next section.
Step 3:构建向量库(可选)
Step 3: Building Vector Database (Optional)
用 Gemini Embedding API 把所有对话对转成 768 维向量,存为 .npz 文件。运行时做余弦相似度检索。Python 版本用到,Web 版本没用。
Use Gemini Embedding API to convert all conversation pairs into 768-dim vectors, stored as .npz files. Cosine similarity search at runtime. Used in Python version, not in Web version.
统计分析:量化一个人的聊天风格
Statistical Analysis: Quantifying Chat Style
从 8394 条真实消息中提取 7 个维度的统计数据:
Extracted 7 dimensions of statistical data from 8,394 real messages:
| 维度 | 分析内容 | 实际数据 |
|---|---|---|
| 消息长度 | 平均字数、短/中/长占比 | 平均 14.2 字,44.4% < 10 字 |
| 标点习惯 | 各标点出现率、无标点率 | 97% 不加标点 |
| 口头禅 | Top 15 高频词及出现次数 | 哈哈(342)、嗯(128)、kk... |
| Emoji | 使用率、Top 10 微信/Unicode emoji | 18.2% 含 emoji,最爱 😂😭🤮 |
| 换行倾向 | 多行消息占比 | 经常拆成多条短消息发送 |
| 中英混杂 | 纯中/纯英/混杂比例 | 29.3% 混杂中英文 |
| 风格总结 | AI 生成的整体描述 | 偏短句型、随意、不拘小节 |
| Dimension | Analysis Content | Actual Data |
|---|---|---|
| Message length | Avg word count, short/mid/long ratio | Avg 14.2 chars, 44.4% < 10 chars |
| Punctuation | Punctuation frequency, no-punctuation rate | 97% no punctuation |
| Catchphrases | Top 15 frequent words with counts | 哈哈(342), 嗯(128), kk... |
| Emoji | Usage rate, Top 10 WeChat/Unicode emoji | 18.2% contain emoji, favorites: 😂😭🤮 |
| Line breaks | Multi-line message ratio | Frequently splits into multiple short messages |
| Mixed language | Pure CN / pure EN / mixed ratio | 29.3% mixed Chinese-English |
| Style summary | AI-generated overall description | Short sentences, casual, laid-back |
这些数字直接编码进 system prompt。AI 看到"平均 14.2 字"就知道不该输出长段落,看到"97% 无标点"就自然省略句号。
These numbers are encoded directly into the system prompt. When AI sees "average 14.2 characters", it knows not to output long paragraphs. When it sees "97% no punctuation", it naturally omits periods.
代表性对话挑选
Selecting Representative Examples
从几千条对话中挑出 20 条作为 few-shot examples,策略是三分法:
From thousands of conversations, 20 are selected as few-shot examples using a three-way split strategy:
1/3 口头禅对话
包含高频词(哈哈、嗯、kk)的消息,展示最典型的说话习惯。
1/3 长度均衡
短/中/长消息各取一些,让 AI 知道不同场景该回多少字。
1/3 随机采样
填补剩余名额,增加多样性,避免 examples 太偏向某种话题。
格式化输出
每条 example 带上下文,格式:朋友(小屎蛋儿): xxx → 你: xxx,让 AI 理解对话节奏。
1/3 Catchphrase Conversations
Messages containing frequent words (haha, hmm, kk), showcasing the most typical speaking habits.
1/3 Length-Balanced
A mix of short/mid/long messages, so AI knows how much to write in different scenarios.
1/3 Random Sampling
Fills remaining slots for diversity, preventing examples from skewing toward one topic.
Formatted Output
Each example includes context, formatted as: Friend(nickname): xxx → You: xxx, helping AI understand conversation rhythm.
System Prompt 的结构
System Prompt Structure
最终的 system prompt 约 4000 字,由 6 个模块组成:
The final system prompt is ~4000 characters, composed of 6 modules:
动态时间注入
Dynamic Time Injection
System prompt 每次请求都重新生成,注入当前时间和计算后的年龄:
System prompt is regenerated for every request, injecting current time and calculated age:
这样 AI 永远不会说错自己的年龄或搞错当前时间。
This way, the AI never states the wrong age or mistakes the current time.
双模式设计
Dual Mode Design
闲聊模式(默认)
短消息、口语化、带口头禅。5-20 字一条,可能拆成 2-3 条发送。
知识模式
触发条件:技术问题、专业话题。允许更长回复,但保持个人语气。
Casual Mode (Default)
Short messages, colloquial, with filler words. 5-20 chars per message, may split into 2-3 messages.
Knowledge Mode
Triggered by: technical questions, professional topics. Allows longer replies but maintains personal voice.
运行时架构
Runtime Architecture
从用户发送消息到收到回复,经历以下流程:
From user sending a message to receiving a reply, the following pipeline executes:
关键参数
Key Parameters
| 参数 | 值 | 原因 |
|---|---|---|
| maxOutputTokens | 128 | 强制短回复,模拟微信消息长度 |
| temperature | 0.8 | 自然但不跑偏——闲聊需要随机性 |
| 上下文窗口 | 20 条 | 平衡连贯性与 token 成本 |
| 最大回复条数 | 3 | 模拟微信多条消息,但不过度 |
| Redis TTL | 90 天 | 够用于 review,不会无限膨胀 |
| Parameter | Value | Reason |
|---|---|---|
| maxOutputTokens | 128 | Force short replies, simulating WeChat message length |
| temperature | 0.8 | Natural but not off-the-rails — casual chat needs randomness |
| Context window | 20 messages | Balance coherence with token cost |
| Max reply count | 3 | Simulate WeChat multi-message, but don't overdo it |
| Redis TTL | 90 days | Enough for review, won't grow indefinitely |
多消息回复:模拟真实聊天节奏
Multi-Bubble Replies: Simulating Real Chat Rhythm
真人在微信聊天不会一次发一大段。会拆成 2-3 条短消息,中间有自然的打字间隔。Aime 完整模拟了这个行为。
Real people don't send one big paragraph on WeChat. They split into 2-3 short messages with natural typing intervals. Aime fully simulates this behavior.
后端:拆分回复
Backend: Splitting Replies
前端:模拟打字延迟
Frontend: Simulating Typing Delay
加上前端的 typing indicator(三个跳动的点),整个体验非常接近真人发消息。
Combined with the frontend's typing indicator (three bouncing dots), the entire experience closely mimics a real person sending messages.
RAG 分析:值不值得用?
RAG Analysis: Is It Worth the Complexity?
Python 版本包含完整的 RAG pipeline——向量化 5000+ 条历史对话,运行时检索最相似的 5 条作为参考。但 Web 版本没有用。为什么?
The Python version includes a complete RAG pipeline — vectorizing 5000+ historical conversations, retrieving the 5 most similar ones as reference at runtime. But the Web version doesn't use it. Why?
RAG 有帮助的场景
小众话题。 比如用户问到特定的共同回忆、小众梗或罕见口头禅,静态 prompt 覆盖不到,RAG 能从历史中检索到相关对话。
RAG 过度工程的场景
日常闲聊。 统计风格规则已经能处理 90% 的常见对话。RAG 增加 ~100ms 延迟和基础设施复杂度,收益有限。
Where RAG Helps
Niche topics. When users ask about specific shared memories, obscure inside jokes, or rare catchphrases, static prompts can't cover them — RAG can retrieve relevant conversations from history.
Where RAG Is Over-Engineering
Daily chat. Statistical style rules already handle 90% of common conversations. RAG adds ~100ms latency and infrastructure complexity for limited gain.
Review 系统:闭环反馈
Review System: Closed-Loop Feedback
Prompt 工程不是一次性的。需要持续观察效果,发现问题,迭代改进。Aime 内置了一个 Review 页面 来完成这个闭环。
Prompt engineering isn't a one-time thing. You need to continuously observe effects, discover issues, and iterate improvements. Aime has a built-in Review page to close this loop.
数据采集
Data Collection
每次对话交互自动保存到 Redis——用户消息、AI 回复、上下文历史、session ID。不阻塞响应,fire-and-forget 模式。
Every conversation interaction is automatically saved to Redis — user messages, AI replies, context history, session ID. Non-blocking, fire-and-forget.
Review 功能
Review Features
双视图模式
Session View:按会话分组,看完整对话流。Flat View:逐条查看,适合快速扫描。
评分系统
每条交互可标记为 Good / Bad。用于追踪 prompt 调整前后的质量变化。
过滤器
All / Unrated / Good / Bad 四种过滤,带实时计数。快速定位问题回复。
JSON 导出
一键导出所有对话数据。可以用来做进一步分析,或作为 fine-tuning 训练集。
Dual View Mode
Session View: Grouped by session, see full conversation flow. Flat View: One-by-one, ideal for quick scanning.
Rating System
Each interaction can be marked Good / Bad. Used to track quality changes before and after prompt adjustments.
Filters
All / Unrated / Good / Bad with real-time counts. Quickly locate problematic replies.
JSON Export
One-click export of all conversation data. Can be used for further analysis or as a fine-tuning training set.
迭代循环
Iteration Cycle
安全边界
Safety Boundaries
Persona chatbot 有一个独特风险:它在扮演一个真人。如果 AI 编造了不存在的个人信息,或者泄露了敏感数据,后果比普通 chatbot 严重得多。
Persona chatbots have a unique risk: they're impersonating a real person. If the AI fabricates non-existent personal information or leaks sensitive data, the consequences are far more serious than with a regular chatbot.
System prompt 中有一个明确的"最重要"规则区:
The system prompt has an explicit "most important" rules section:
"这个我不太方便说哈"。
Whitelist > Blacklist. Not "don't say X", but "you can only say these things". The AI only knows personal information provided in the prompt. Anything beyond that is deflected with humor: "I'd rather not say haha".
总结
Conclusion
Aime 的核心公式:
Aime's core formula:
不需要微调。不需要训练数据。不需要 GPU。
No fine-tuning needed. No training data. No GPU.
一个 BeautifulSoup 解析脚本、一个统计分析脚本、一个好的 system prompt。三个组件就能让 AI 学会一个真人的聊天风格——精确到每条消息的平均字数和 emoji 使用频率。
A BeautifulSoup parsing script, a statistical analysis script, a well-crafted system prompt. Three components are all it takes to teach AI a real person's chat style — precise down to average character count and emoji usage frequency.
如果你也想做一个 persona chatbot,不要从模型微调开始。从数据开始。先量化你的说话习惯,再让 AI 执行这些数字。
If you also want to build a persona chatbot, don't start with model fine-tuning. Start with data. First quantify your speaking habits, then let AI execute those numbers.