Files
bili_follow_group/source/README_up_analysis.md
2026-04-26 19:40:24 +08:00

153 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# B站关注清理工具优化版
本项目保留并聚焦一条可用功能链:
1. 抓取视频标题
2. 分批AI分析
3. 生成取关UID支持按100拆分
4. 生成保留关注报告
## 目录结构
```text
source/
resources/ # 资源文件
export_uids.json
export_uids.txt
output/ # 产物目录
reports/ # 报告文件
up_titles_report.md
up_analysis_full_auto.md
up_keep_follow_only.md
uids/ # 取关UID结果
unfollow_mids_list.txt
unfollow_mids_list_1.txt
unfollow_mids_list_2.txt
...
analyze_up_content.py # 步骤1抓取标题
batch_ai_summary_from_report.py# 步骤2分批分析
extract_keep_follow_doc.py # 步骤3保留关注报告
extract_unfollow_list.py # 步骤4取关UID
run_pipeline.py # 一键流水线
README_up_analysis.md
```
## 先配置 API
编辑 [source/analyze_up_content.py](source/analyze_up_content.py) 顶部配置:
```python
VOLCENGINE_API_KEY = "你的火山引擎API Key"
VOLCENGINE_MODEL = "deepseek-v3-1-terminus"
VOLCENGINE_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
```
`batch_ai_summary_from_report.py` 会自动读取该配置。
## 一键推荐用法
在项目根目录运行:
```powershell
python source/run_pipeline.py
```
默认会完成:
1. 从 [source/resources/export_uids.json](source/resources/export_uids.json) 抓取标题到 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
2. 分批分析到 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
3. 生成保留关注报告 [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
4. 生成取关UID [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt) 并按100拆分
## 常用参数
```powershell
# 提升速度
python source/run_pipeline.py --workers 8 --batch-size 30 --sleep-seconds 0
# 只先抓取前50个做试跑
python source/run_pipeline.py --max-ups 50
# 仅处理带标签UP
python source/run_pipeline.py --only-tag "准备取关"
# 跳过抓取(复用已有标题报告)
python source/run_pipeline.py --skip-fetch
# 跳过分析(复用已有分析报告,仅生成产物)
python source/run_pipeline.py --skip-analyze
# 修改UID拆分粒度
python source/run_pipeline.py --split-size 200
```
## 分步执行(可选)
### 步骤1抓取标题
```powershell
python source/analyze_up_content.py --skip-ai
```
默认输出:
- [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
### 步骤2分批AI分析
```powershell
python source/batch_ai_summary_from_report.py --run-all-batches
# 小批量测试
python source/batch_ai_summary_from_report.py
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\18_12.md --force
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\19_06_all.md --force --run-all-batches
```
默认输入/输出:
- 输入 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
- 输出 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
### 步骤3生成保留关注报告
```powershell
python source/extract_keep_follow_doc.py
python source/extract_keep_follow_doc.py --input source/19_06_all.md --output source/19_30_keep_follow.md
```
输出:
- [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
### 步骤4生成取关UID
```powershell
python source/extract_unfollow_list.py --format mid-only --split-size 100
```
输出:
- 主文件 [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt)
- 拆分文件 [source/output/uids/unfollow_mids_list_1.txt](source/output/uids/unfollow_mids_list_1.txt) 等
## 结果解释
- `up_analysis_full_auto.md`:完整分析报告(含取关/保留)
- `up_keep_follow_only.md`仅保留关注UP的AI分析与分组建议
- `unfollow_mids_list.txt`可取关UID逗号分隔列表可直接粘贴使用
## 建议参数
- 稳定优先:`--workers 4 --max-retries 2 --request-timeout 60`
- 速度优先:`--workers 8 --batch-size 30 --sleep-seconds 0`
- 低风险试跑:`--max-ups 30` 先验证再全量
### 结果按首字母排序
```
python sort_up_main.py
```