核心理念:为什么用 LLM 做推荐
Core Idea: Why Use an LLM for Recommendations
传统推荐系统的搭建流程:收集大量用户行为数据 → 特征工程 → 训练模型 → 召回 → 排序 → 推荐。这条路对大公司有意义,但对个人项目来说是 overkill。
The traditional recommendation system pipeline: collect massive user behavior data → feature engineering → train models → recall → ranking → recommend. This makes sense for big companies, but it's overkill for a personal project.
这条路之所以可行,靠的是三个洞察:
This approach works because of three key insights:
- LLM 天生具备音乐知识。 Gemini 在训练时已经"见过"海量乐评和歌单,它知道 LANY 的风格接近 The 1975 而不是 Metallica。
- 自然语言是最灵活的特征编码。 不需要 one-hot encoding 或向量化,直接写 "dreamy indie pop with nostalgic vibes" 比任何 embedding 更精确。
- Prompt 是实时可调的。 不需要重新训练,改一行 prompt 就能改变推荐策略。
- LLMs inherently know music. Gemini has "seen" vast amounts of music reviews and playlists during training — it knows LANY's style is closer to The 1975 than Metallica.
- Natural language is the most flexible feature encoding. No one-hot encoding or vectorization needed — just write "dreamy indie pop with nostalgic vibes" and it's more precise than any embedding.
- Prompts are adjustable in real-time. No retraining needed — change one line and the recommendation strategy shifts.
定义数据模型
Defining the Data Model
整个推荐系统只需要三个核心类型。不多也不少。
The entire recommendation system needs just three core types. No more, no less.
TasteProfile
多维度建模(风格、情绪、年代、能量)比单一标签更精确。自由文本的 summary 让 AI 有空间表达复杂品味。双语 summary 支持中英文 UI。
Multi-dimensional modeling (genre, mood, era, energy) is far more precise than single labels. The free-text summary gives AI room to express complex taste patterns. Bilingual summaries support both CN/EN UI.
SongFeedback
≥50% = 喜欢,<50% = 跳过。liked 是额外的强信号。零摩擦的数据采集。
No explicit ratings (1-5 stars). Entirely implicit signals. Listen duration ratio is the core signal: ≥50% = enjoyed, <50% = skipped. liked is a bonus strong signal. Zero-friction data collection.
品味提取:一个 playlist 的信息量
Taste Extraction: The Information in a Playlist
用户的输入极其简单——从 Spotify、Apple Music 或 YouTube Music 复制粘贴自己的歌单。文本可能很乱,格式各异。AI 负责从中解析。
User input is dead simple — copy-paste a playlist from Spotify, Apple Music, or YouTube Music. The text may be messy and inconsistently formatted. The AI handles parsing.
初始化时,一次 Gemini 调用同时完成两件事:分析品味 + 生成推荐。关键技术是 Structured Output——用 responseSchema 强制约束输出格式:
During initialization, a single Gemini call does two things at once: analyze taste + generate recommendations. The key technique is Structured Output — using responseSchema to enforce output format:
推荐引擎:Prompt 就是算法
The Rec Engine: Prompt as Algorithm
核心创新:把用户行为数据直接编码成自然语言,注入到 prompt 中。这让一个无状态的 LLM 变成了有状态的推荐引擎。
The core innovation: encode user behavior directly as natural language and inject it into the prompt. This turns a stateless LLM into a stateful recommendation engine.
Profile 不可变
Taste profile 提取一次后就不再更新。如果让 AI 每轮修改 profile,会导致「品味漂移」——连续跳几首慢歌,"Chill" 标签就消失。行为调节靠 feedback,不靠改 profile。
完整播放列表
把所有已播放歌曲都传进 prompt。试过只传最近 N 首,但 AI 会重复推荐。对个人使用(<100 首),token 成本可忽略。
时长比例是信号
(240s/240s, liked) 比单纯 "enjoyed" 信息量大得多。AI 能区分"听完了"和"听完了还点赞"。
容错设计
当 songDurationSec === 0(YouTube 偶尔不报告时长)时,默认归类为"喜欢"。False positive 比 false negative 伤害小。
Immutable Profile
The taste profile is extracted once and never updated. Letting AI modify it each round causes "taste drift" — skip a few slow songs and the "Chill" tag disappears forever. Behavior steers via feedback, not profile changes.
Full Played List
All played songs are sent in the prompt. Tried sending only the last N, but the AI would repeat songs from earlier rounds. For personal use (<100 songs), token cost is negligible.
Duration Ratio as Signal
(240s/240s, liked) carries far more information than just "enjoyed". The AI can distinguish "listened through" from "listened through AND hearted".
Graceful Fallback
When songDurationSec === 0 (YouTube sometimes doesn't report duration), default to "enjoyed". False positives hurt less than false negatives.
反馈闭环:让系统越用越聪明
The Feedback Loop: Getting Smarter Over Time
推荐系统的价值不在首次推荐,而在持续学习。Tunes 的反馈闭环完全自动化——用户只需要正常听歌。
A recommendation system's value isn't in the first batch — it's in continuous learning. Tunes' feedback loop is fully automated — users just listen to music.
50% 阈值的由来
Why the 50% Threshold
反馈分类的阈值经过实际使用校准:
The classification threshold was calibrated through real usage:
- < 30%:太严格。很多"听了一半觉得还行"的歌被算成跳过
- 50%:直觉上"听了一半"说明至少不讨厌。实测效果最好
- > 70%:太宽松。有些歌只是来不及跳过就被算成喜欢
- < 30%: Too strict. Many "listened halfway and thought it was fine" songs get classified as skipped
- 50%: Intuitively, "listened to half" means at least tolerable. Best results in practice
- > 70%: Too lenient. Some songs just weren't skipped in time
队列预取
Queue Pre-fetching
当队列剩余不足 10 首时,自动触发新的 Gemini 调用。用户永远不需要等——下一批歌在当前批次播放时就已准备好。
When the queue drops below 10 songs, a new Gemini call fires automatically. Users never wait — the next batch is ready before the current one finishes.
播放与搜索
Playback & Search
Gemini 返回的每首歌都带有一个 searchQuery(格式 "Artist - Song Title")。播放前需要解析为 YouTube 视频 ID。
Each song from Gemini includes a searchQuery (format: "Artist - Song Title"). Before playback, this needs to be resolved to a YouTube video ID.
YouTube Data API 搜索
发送 searchQuery,限定 videoCategoryId: "10"(音乐)+ videoEmbeddable: true。
缓存 videoId
首次解析后把 videoId 写回 song 对象。后续重播直接使用。
IFrame 播放
YouTube IFrame API 播放视频,监听 onStateChange 获取播放状态和时长。
采集反馈
播放完毕或用户跳过时,记录 listenDurationSec 和 songDurationSec。
YouTube Data API Search
Send searchQuery, filter by videoCategoryId: "10" (Music) + videoEmbeddable: true.
Cache videoId
After first resolution, write videoId back to the song object. Subsequent plays reuse it.
IFrame Playback
YouTube IFrame API plays the video, onStateChange events report play state and duration.
Collect Feedback
When song ends or user skips, record listenDurationSec and songDurationSec.
数据持久化:不丢一个反馈
Data Persistence: Never Lose a Beat
驾驶场景意味着随时可能断网、锁屏、关闭 app。同步策略必须足够健壮。
Driving means unpredictable connectivity. The sync strategy must be robust enough to handle tab closes, screen locks, and network drops.
防抖保存
每次反馈事件触发 2 秒防抖。快速跳过批量合并为一次 Redis 写入。
visibilitychange
切 tab 或锁屏时立即保存。覆盖「手机放进口袋」场景。
sendBeacon 兜底
beforeunload 时用 sendBeacon() 发最终保存。页面销毁也能送达。
队列缓存
待播列表持久化到 Redis。下次打开直接从缓存加载,无需等 Gemini。
Debounced Save
Each feedback event triggers a 2-second debounce. Rapid skips batch into a single Redis write.
visibilitychange
Immediate save when switching tabs or locking the screen. Covers the "phone in pocket" scenario.
sendBeacon Fallback
On beforeunload, sendBeacon() fires a final save. Works even when the page is being destroyed.
Cached Queue
Upcoming songs persist to Redis. Next session loads instantly from cache — no Gemini call needed.
Prompt 工程笔记
Prompt Engineering Notes
实践中踩过的坑
Lessons Learned
不要让 AI 更新 profile
会导致品味漂移。连续跳几首慢歌 → "Chill" 标签消失 → 再也推不出慢歌。
不要用 temperature=0
推荐极度单调——每次推相同风格。0.9 是甜蜜点。
不要省略 already played
AI 没有跨调用记忆。不传完整列表,第 3 轮就开始重复。
指令不要太细
"必须包含一首 2000 年代的歌" 之类硬规则让推荐变机械。"Adapt and surprise them" 更有效。
Don't let AI update the profile
Causes taste drift. Skip a few slow songs → "Chill" tag disappears → never recommends slow songs again.
Don't use temperature=0
Recommendations become monotonous — same style every time. 0.9 is the sweet spot.
Don't omit already played
The AI has no cross-call memory. Without the full list, it starts repeating by round 3.
Don't over-specify rules
"Must include one 2000s song" makes recommendations feel mechanical. "Adapt and surprise them" works better.
Trade-offs
用 LLM 做推荐引擎是非常规路线。以下是诚实的优劣对比:
Using an LLM as the recommendation engine is unconventional. Here's an honest comparison:
| 维度 | LLM (Tunes) | 传统推荐 |
|---|---|---|
| 冷启动 | 零——一个歌单即可 | 需大量用户数据 |
| 可解释性 | 自然语言推荐理由 | 不透明的分数 |
| 实时适应 | 通过 prompt 上下文 | 需要重训模型 |
| 延迟 | ~2s / 批次 | 亚毫秒查询 |
| 一致性 | 非确定性 | 可复现排序 |
| 音乐知识 | 内置世界知识 | 只知训练数据 |
| Dimension | LLM (Tunes) | Traditional RecSys |
|---|---|---|
| Cold start | Zero — works from 1 playlist | Needs massive user data |
| Explainability | Natural language reasons | Opaque similarity scores |
| Adaptability | Real-time via prompt context | Requires model retraining |
| Latency | ~2s per batch | Sub-millisecond lookups |
| Consistency | Non-deterministic | Reproducible rankings |
| Music knowledge | World knowledge built in | Only knows training data |
总结
Conclusion
传统推荐系统的复杂性——协同过滤、embedding、特征工程、模型训练——在个人场景下全部是过度工程。一个好的 prompt、一个结构化输出 schema、一个行为反馈闭环。就这些。
The complexity of traditional recommendation systems — collaborative filtering, embeddings, feature engineering, model training — is all over-engineering for a personal use case. A good prompt, a structured output schema, and a behavioral feedback loop. That's all you need.
如果你也想搭一个自己的推荐系统,不要从 TensorFlow 开始。从一个 prompt 开始。
If you want to build your own recommendation system, don't start with TensorFlow. Start with a prompt.