LLM-as-RecSys

核心理念：为什么用 LLM 做推荐

Core Idea: Why Use an LLM for Recommendations

传统推荐系统的搭建流程：收集大量用户行为数据 → 特征工程 → 训练模型 → 召回 → 排序 → 推荐。这条路对大公司有意义，但对个人项目来说是 overkill。

The traditional recommendation system pipeline: collect massive user behavior data → feature engineering → train models → recall → ranking → recommend. This makes sense for big companies, but it's overkill for a personal project.

传统路线Traditional User logs → Features → Training → Recall → Ranking → Recommend Tunes Playlist → LLM analyzes taste → Behavior → Inject into prompt → LLM recommends → Loop

这条路之所以可行，靠的是三个洞察：

This approach works because of three key insights:

LLM 天生具备音乐知识。 Gemini 在训练时已经"见过"海量乐评和歌单，它知道 LANY 的风格接近 The 1975 而不是 Metallica。
自然语言是最灵活的特征编码。 不需要 one-hot encoding 或向量化，直接写 "dreamy indie pop with nostalgic vibes" 比任何 embedding 更精确。
Prompt 是实时可调的。 不需要重新训练，改一行 prompt 就能改变推荐策略。

LLMs inherently know music. Gemini has "seen" vast amounts of music reviews and playlists during training — it knows LANY's style is closer to The 1975 than Metallica.
Natural language is the most flexible feature encoding. No one-hot encoding or vectorization needed — just write "dreamy indie pop with nostalgic vibes" and it's more precise than any embedding.
Prompts are adjustable in real-time. No retraining needed — change one line and the recommendation strategy shifts.

定义数据模型

Defining the Data Model

整个推荐系统只需要三个核心类型。不多也不少。

The entire recommendation system needs just three core types. No more, no less.

TasteProfile

TasteProfile {
  genres:      ["Pop", "R&B", "Indie Pop", "Hip-Hop", "C-Pop"]
  moods:       ["Nostalgic", "Romantic", "Chill", "Melancholy"]
  eras:        ["2000s", "2010s", "2020s"]
  energyLevel: "Medium — mix of chill vibes and upbeat grooves"
  summary:     "Multilingual listener with a strong taste for smooth R&B..."
  summaryCn:   "多语言听众，偏爱丝滑 R&B、梦幻独立流行..."
}

多维度建模（风格、情绪、年代、能量）比单一标签更精确。自由文本的 summary 让 AI 有空间表达复杂品味。双语 summary 支持中英文 UI。

Multi-dimensional modeling (genre, mood, era, energy) is far more precise than single labels. The free-text summary gives AI room to express complex taste patterns. Bilingual summaries support both CN/EN UI.

SongFeedback

SongFeedback {
  listenDurationSec: 240   // how long they actually listened
  songDurationSec:   240   // total song length
  liked:             true  // hearted?
}

设计原则：Design principle: 不做显式评分（1-5 星），完全依赖隐式信号。收听时长占比是核心信号：≥50% = 喜欢，<50% = 跳过。liked 是额外的强信号。零摩擦的数据采集。 No explicit ratings (1-5 stars). Entirely implicit signals. Listen duration ratio is the core signal: ≥50% = enjoyed, <50% = skipped. liked is a bonus strong signal. Zero-friction data collection.

品味提取：一个 playlist 的信息量

Taste Extraction: The Information in a Playlist

用户的输入极其简单——从 Spotify、Apple Music 或 YouTube Music 复制粘贴自己的歌单。文本可能很乱，格式各异。AI 负责从中解析。

User input is dead simple — copy-paste a playlist from Spotify, Apple Music, or YouTube Music. The text may be messy and inconsistently formatted. The AI handles parsing.

初始化时，一次 Gemini 调用同时完成两件事：分析品味 + 生成推荐。关键技术是 Structured Output——用 responseSchema 强制约束输出格式：

During initialization, a single Gemini call does two things at once: analyze taste + generate recommendations. The key technique is Structured Output — using responseSchema to enforce output format:

// Gemini API call config
const response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents: prompt,
  config: {
    responseMimeType: "application/json",
    responseSchema: RECOMMEND_SCHEMA,  // enforced structure
    temperature: 0.9,                  // high creativity
    maxOutputTokens: 4096,
  },
});

为什么 temperature = 0.9？Why temperature = 0.9? 推荐系统需要多样性。太低的话 AI 每次推荐相同的"安全"歌曲。0.9 在保持相关性的同时引入足够随机性——像一个有品味但不无聊的 DJ。 Recommendation systems need diversity. Too low and the AI recommends the same "safe" songs every time. 0.9 balances relevance with enough randomness — like a DJ with taste who isn't boring.

推荐引擎：Prompt 就是算法

The Rec Engine: Prompt as Algorithm

核心创新：把用户行为数据直接编码成自然语言，注入到 prompt 中。这让一个无状态的 LLM 变成了有状态的推荐引擎。

The core innovation: encode user behavior directly as natural language and inject it into the prompt. This turns a stateless LLM into a stateful recommendation engine.

// Adaptive prompt structure

TASTE PROFILE:
- Genres: Pop, R&B, Indie Pop, Synth-Pop, C-Pop
- Moods: Nostalgic, Romantic, Chill

SONGS THEY ENJOYED (listened longer):
  "LANY - Dancing in the Kitchen" (240s/240s, liked)
  "The Weeknd - Save Your Tears" (200s/220s)

SONGS THEY SKIPPED:
  "Unknown - Sad Indie Ballad" (45s/200s)

INSTRUCTIONS:
  1. Recommend 10 NEW songs:
     - Enjoyed → recommend more like those
     - Skipped → avoid that style/mood
  2. Do NOT repeat any already played songs

Profile 不可变

Taste profile 提取一次后就不再更新。如果让 AI 每轮修改 profile，会导致「品味漂移」——连续跳几首慢歌，"Chill" 标签就消失。行为调节靠 feedback，不靠改 profile。

完整播放列表

把所有已播放歌曲都传进 prompt。试过只传最近 N 首，但 AI 会重复推荐。对个人使用（<100 首），token 成本可忽略。

时长比例是信号

(240s/240s, liked) 比单纯 "enjoyed" 信息量大得多。AI 能区分"听完了"和"听完了还点赞"。

容错设计

当 songDurationSec === 0（YouTube 偶尔不报告时长）时，默认归类为"喜欢"。False positive 比 false negative 伤害小。

Immutable Profile

The taste profile is extracted once and never updated. Letting AI modify it each round causes "taste drift" — skip a few slow songs and the "Chill" tag disappears forever. Behavior steers via feedback, not profile changes.

Full Played List

All played songs are sent in the prompt. Tried sending only the last N, but the AI would repeat songs from earlier rounds. For personal use (<100 songs), token cost is negligible.

Duration Ratio as Signal

(240s/240s, liked) carries far more information than just "enjoyed". The AI can distinguish "listened through" from "listened through AND hearted".

Graceful Fallback

When songDurationSec === 0 (YouTube sometimes doesn't report duration), default to "enjoyed". False positives hurt less than false negatives.

反馈闭环：让系统越用越聪明

The Feedback Loop: Getting Smarter Over Time

推荐系统的价值不在首次推荐，而在持续学习。Tunes 的反馈闭环完全自动化——用户只需要正常听歌。

A recommendation system's value isn't in the first batch — it's in continuous learning. Tunes' feedback loop is fully automated — users just listen to music.

完整循环Full cycle

User listens → YouTube reports events
Song ends or user skips → recordFeedback()
Feedback recorded: duration, liked status
Queue < 10 songs → auto-trigger fetchMore()
Gemini receives: profile + feedback + played list
Returns 10 new songs → added to queue
Back to step 1

50% 阈值的由来

Why the 50% Threshold

反馈分类的阈值经过实际使用校准：

The classification threshold was calibrated through real usage:

< 30%：太严格。很多"听了一半觉得还行"的歌被算成跳过
50%：直觉上"听了一半"说明至少不讨厌。实测效果最好
> 70%：太宽松。有些歌只是来不及跳过就被算成喜欢

< 30%: Too strict. Many "listened halfway and thought it was fine" songs get classified as skipped
50%: Intuitively, "listened to half" means at least tolerable. Best results in practice
> 70%: Too lenient. Some songs just weren't skipped in time

队列预取

Queue Pre-fetching

当队列剩余不足 10 首时，自动触发新的 Gemini 调用。用户永远不需要等——下一批歌在当前批次播放时就已准备好。

When the queue drops below 10 songs, a new Gemini call fires automatically. Users never wait — the next batch is ready before the current one finishes.

播放与搜索

Playback & Search

Gemini 返回的每首歌都带有一个 searchQuery（格式 "Artist - Song Title"）。播放前需要解析为 YouTube 视频 ID。

Each song from Gemini includes a searchQuery (format: "Artist - Song Title"). Before playback, this needs to be resolved to a YouTube video ID.

STEP 01

YouTube Data API 搜索

发送 searchQuery，限定 videoCategoryId: "10"（音乐）+ videoEmbeddable: true。

STEP 02

缓存 videoId

首次解析后把 videoId 写回 song 对象。后续重播直接使用。

STEP 03

IFrame 播放

YouTube IFrame API 播放视频，监听 onStateChange 获取播放状态和时长。

STEP 04

采集反馈

播放完毕或用户跳过时，记录 listenDurationSec 和 songDurationSec。

STEP 01

YouTube Data API Search

Send searchQuery, filter by videoCategoryId: "10" (Music) + videoEmbeddable: true.

STEP 02

Cache videoId

After first resolution, write videoId back to the song object. Subsequent plays reuse it.

STEP 03

IFrame Playback

YouTube IFrame API plays the video, onStateChange events report play state and duration.

STEP 04

Collect Feedback

When song ends or user skips, record listenDurationSec and songDurationSec.

为什么不用 Spotify API？Why not Spotify API? Spotify Web Playback 需要 Premium 账号 + OAuth。YouTube 免费、覆盖率接近 100%、IFrame API 事件系统完善。 Spotify Web Playback requires Premium + OAuth. YouTube is free, covers nearly 100% of songs, and has excellent IFrame API events.

数据持久化：不丢一个反馈

Data Persistence: Never Lose a Beat

驾驶场景意味着随时可能断网、锁屏、关闭 app。同步策略必须足够健壮。

Driving means unpredictable connectivity. The sync strategy must be robust enough to handle tab closes, screen locks, and network drops.

防抖保存

每次反馈事件触发 2 秒防抖。快速跳过批量合并为一次 Redis 写入。

visibilitychange

切 tab 或锁屏时立即保存。覆盖「手机放进口袋」场景。

sendBeacon 兜底

beforeunload 时用 sendBeacon() 发最终保存。页面销毁也能送达。

队列缓存

待播列表持久化到 Redis。下次打开直接从缓存加载，无需等 Gemini。

Debounced Save

Each feedback event triggers a 2-second debounce. Rapid skips batch into a single Redis write.

visibilitychange

Immediate save when switching tabs or locking the screen. Covers the "phone in pocket" scenario.

sendBeacon Fallback

On beforeunload, sendBeacon() fires a final save. Works even when the page is being destroyed.

Cached Queue

Upcoming songs persist to Redis. Next session loads instantly from cache — no Gemini call needed.

Prompt 工程笔记

Prompt Engineering Notes

[Role]      → Music expert DJ, driving playlists
[Profile]   → All TasteProfile fields
[Behavior]  → Enjoyed songs (with duration) + Skipped songs
[Blocklist] → Full already-played list
[Rules]     → Recommendation constraints

实践中踩过的坑

Lessons Learned

不要让 AI 更新 profile

会导致品味漂移。连续跳几首慢歌 → "Chill" 标签消失 → 再也推不出慢歌。

不要用 temperature=0

推荐极度单调——每次推相同风格。0.9 是甜蜜点。

不要省略 already played

AI 没有跨调用记忆。不传完整列表，第 3 轮就开始重复。

指令不要太细

"必须包含一首 2000 年代的歌" 之类硬规则让推荐变机械。"Adapt and surprise them" 更有效。

Don't let AI update the profile

Causes taste drift. Skip a few slow songs → "Chill" tag disappears → never recommends slow songs again.

Don't use temperature=0

Recommendations become monotonous — same style every time. 0.9 is the sweet spot.

Don't omit already played

The AI has no cross-call memory. Without the full list, it starts repeating by round 3.

Don't over-specify rules

"Must include one 2000s song" makes recommendations feel mechanical. "Adapt and surprise them" works better.

Trade-offs

用 LLM 做推荐引擎是非常规路线。以下是诚实的优劣对比：

Using an LLM as the recommendation engine is unconventional. Here's an honest comparison:

维度	LLM (Tunes)	传统推荐
冷启动	零——一个歌单即可	需大量用户数据
可解释性	自然语言推荐理由	不透明的分数
实时适应	通过 prompt 上下文	需要重训模型
延迟	~2s / 批次	亚毫秒查询
一致性	非确定性	可复现排序
音乐知识	内置世界知识	只知训练数据

Dimension	LLM (Tunes)	Traditional RecSys
Cold start	Zero — works from 1 playlist	Needs massive user data
Explainability	Natural language reasons	Opaque similarity scores
Adaptability	Real-time via prompt context	Requires model retraining
Latency	~2s per batch	Sub-millisecond lookups
Consistency	Non-deterministic	Reproducible rankings
Music knowledge	World knowledge built in	Only knows training data

结论：Verdict: 对于单用户的个人工具，LLM 方案全面碾压。零基础设施、即时冷启动、自然语言反馈。劣势只在百万用户规模时才成问题。 For a single-user personal tool, the LLM approach wins hands down. Zero infrastructure, instant cold start, natural language feedback. The trade-offs only matter at scale — and this isn't meant to be Spotify.

总结

Conclusion

好的推荐Great recs = 精准的品味画像Precise taste profile + 实时的行为反馈Real-time behavioral feedback + 高创造性的 LLMCreative LLM

传统推荐系统的复杂性——协同过滤、embedding、特征工程、模型训练——在个人场景下全部是过度工程。一个好的 prompt、一个结构化输出 schema、一个行为反馈闭环。就这些。

The complexity of traditional recommendation systems — collaborative filtering, embeddings, feature engineering, model training — is all over-engineering for a personal use case. A good prompt, a structured output schema, and a behavioral feedback loop. That's all you need.

如果你也想搭一个自己的推荐系统，不要从 TensorFlow 开始。从一个 prompt 开始。

If you want to build your own recommendation system, don't start with TensorFlow. Start with a prompt.

Try it: tunes.ning.codes · Technical Showcase