Files
bili_follow_group/readme.md

116 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# B站关注清理工具 - Scripts 版
> 一键命令运行全流程:`python source/scripts/run_pipeline.py`
python source/scripts/run_pipeline.py --input-json source/resources/export_uids_test5.json
本工具包含7个步骤的完整流水线
1. 抓取视频标题
2. 分批AI分析
3. 生成保留关注报告
4. 生成取关UID列表
5. 按首字母排序
6. 提取分组信息
7. 删除最近10条标题
## 快速开始
```powershell
# 完整流程(推荐)
python source/scripts/run_pipeline.py
# 速度优先
python source/scripts/run_pipeline.py --workers 8 --batch-size 30 --sleep-seconds 0
# 试跑30个UP
python source/scripts/run_pipeline.py --max-ups 30
# 跳过抓取,使用已有标题报告
python source/scripts/run_pipeline.py --skip-fetch
# 跳过分析,仅生成产物
python source/scripts/run_pipeline.py --skip-analyze
# 跳过排序/分组/删除
python source/scripts/run_pipeline.py --skip-sort --skip-group --skip-remove
```
## 输出文件
| 文件 | 说明 |
|------|------|
| `source/output/reports/1_up_titles_report.md` | 标题抓取报告 |
| `source/output/reports/2_up_analysis_full_auto.md` | AI分析报告完整 |
| `source/output/reports/3_up_keep_follow_only.md` | 保留关注报告 |
| `source/output/uids/4_unfollow_mids_list.txt` | 取关UID列表 |
| `source/output/reports/5_sorted_up_analysis.md` | 按首字母排序报告 |
| `source/output/reports/6_group_info.md` | 提取分组信息报告 |
| `source/output/reports/7_no_titles.md` | 最终报告删除最近10条 |
## 常用参数
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `--workers` | 6 | 并发请求数 |
| `--batch-size` | 20 | 每批分析条数 |
| `--max-ups` | 0全部 | 限制处理UP数量 |
| `--split-size` | 100 | UID拆分大小 |
| `--sleep-seconds` | 0 | 任务间隔秒数 |
### 跳过参数
| 参数 | 说明 |
|------|------|
| `--skip-fetch` | 跳过抓取阶段 |
| `--skip-analyze` | 跳过分析阶段 |
| `--skip-sort` | 跳过排序阶段 |
| `--skip-group` | 跳过提取分组阶段 |
| `--skip-remove` | 跳过删除最近10条阶段 |
## 分步执行
### 步骤1抓取标题
```powershell
python source/scripts/analyze_up_content.py --skip-ai
```
### 步骤2分批AI分析
```powershell
python source/scripts/batch_ai_summary_from_report.py --run-all-batches
```
### 步骤3生成保留关注报告
```powershell
python source/scripts/extract_keep_follow_doc.py
```
### 步骤4生成取关UID
```powershell
python source/scripts/extract_unfollow_list.py --format mid-only --split-size 100
```
### 步骤5按首字母排序
```powershell
python source/scripts/sort_up_main.py
```
### 步骤6提取分组信息
```powershell
python source/scripts/extract_group_info.py
```
### 步骤7删除最近10条标题
```powershell
python source/scripts/remove_10content.py
```
## 先配置API
编辑 [source/scripts/analyze_up_content.py](source/scripts/analyze_up_content.py) 顶部配置:
```python
VOLCENGINE_API_KEY = "你的火山引擎API Key"
VOLCENGINE_MODEL = "deepseek-v3-1-terminus"
VOLCENGINE_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
```