diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..6693684 --- /dev/null +++ b/readme.md @@ -0,0 +1,159 @@ +# B站关注清理工具(优化版) + +本项目保留并聚焦一条可用功能链: + +1. 抓取视频标题 +2. 分批AI分析 +3. 生成取关UID(支持按100拆分) +4. 生成保留关注报告 + +## 目录结构 + +```text +source/ + resources/ # 资源文件 + export_uids.json + export_uids.txt + + output/ # 产物目录 + reports/ # 报告文件 + up_titles_report.md + up_analysis_full_auto.md + up_keep_follow_only.md + uids/ # 取关UID结果 + unfollow_mids_list.txt + unfollow_mids_list_1.txt + unfollow_mids_list_2.txt + ... + + analyze_up_content.py # 步骤1:抓取标题 + batch_ai_summary_from_report.py# 步骤2:分批分析 + extract_keep_follow_doc.py # 步骤3:保留关注报告 + extract_unfollow_list.py # 步骤4:取关UID + run_pipeline.py # 一键流水线 + README_up_analysis.md +``` + +## 先配置 API + +编辑 [source/analyze_up_content.py](source/analyze_up_content.py) 顶部配置: + +```python +VOLCENGINE_API_KEY = "你的火山引擎API Key" +VOLCENGINE_MODEL = "deepseek-v3-1-terminus" +VOLCENGINE_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3" +``` + +`batch_ai_summary_from_report.py` 会自动读取该配置。 + +## 一键推荐用法 + +在项目根目录运行: + +```powershell +python source/run_pipeline.py +``` + +默认会完成: + +1. 从 [source/resources/export_uids.json](source/resources/export_uids.json) 抓取标题到 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md) +2. 分批分析到 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md) +3. 生成保留关注报告 [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md) +4. 生成取关UID [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt) 并按100拆分 + +## 常用参数 + +```powershell +# 提升速度 +python source/run_pipeline.py --workers 8 --batch-size 30 --sleep-seconds 0 + +# 只先抓取前50个做试跑 +python source/run_pipeline.py --max-ups 50 + +# 仅处理带标签UP +python source/run_pipeline.py --only-tag "准备取关" + +# 跳过抓取(复用已有标题报告) +python source/run_pipeline.py --skip-fetch + +# 跳过分析(复用已有分析报告,仅生成产物) +python source/run_pipeline.py --skip-analyze + +# 修改UID拆分粒度 +python source/run_pipeline.py --split-size 200 +``` + +## 分步执行(可选) + +### 步骤1:抓取标题 + +```powershell +python source/analyze_up_content.py --skip-ai +``` + +默认输出: +- [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md) + +### 步骤2:分批AI分析 + +```powershell +python source/batch_ai_summary_from_report.py --run-all-batches +# 小批量测试 +python source/batch_ai_summary_from_report.py + + +python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\18_12.md --force + +python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\19_06_all.md --force --run-all-batches +``` + +默认输入/输出: +- 输入 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md) +- 输出 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md) + +### 步骤3:生成保留关注报告 + +```powershell +python source/extract_keep_follow_doc.py + +python source/extract_keep_follow_doc.py --input source/19_06_all.md --output source/19_30_keep_follow.md +``` + +输出: +- [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md) + +### 步骤4:生成取关UID + +```powershell +python source/extract_unfollow_list.py --format mid-only --split-size 100 +``` + +输出: +- 主文件 [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt) +- 拆分文件 [source/output/uids/unfollow_mids_list_1.txt](source/output/uids/unfollow_mids_list_1.txt) 等 + +## 结果解释 + +- `up_analysis_full_auto.md`:完整分析报告(含取关/保留) +- `up_keep_follow_only.md`:仅保留关注UP的AI分析与分组建议 +- `unfollow_mids_list.txt`:可取关UID逗号分隔列表(可直接粘贴使用) + +## 建议参数 + +- 稳定优先:`--workers 4 --max-retries 2 --request-timeout 60` +- 速度优先:`--workers 8 --batch-size 30 --sleep-seconds 0` +- 低风险试跑:`--max-ups 30` 先验证再全量 + + + +### 结果按首字母排序 + +``` +python sort_up_main.py +``` + + +### 提取分组 +``` +python source/extract_group_info.py --input source/19_53_no_titles.md --output source/group_only.md +``` \ No newline at end of file