添加readme文件
This commit is contained in:
159
readme.md
Normal file
159
readme.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# B站关注清理工具(优化版)
|
||||
|
||||
本项目保留并聚焦一条可用功能链:
|
||||
|
||||
1. 抓取视频标题
|
||||
2. 分批AI分析
|
||||
3. 生成取关UID(支持按100拆分)
|
||||
4. 生成保留关注报告
|
||||
|
||||
## 目录结构
|
||||
|
||||
```text
|
||||
source/
|
||||
resources/ # 资源文件
|
||||
export_uids.json
|
||||
export_uids.txt
|
||||
|
||||
output/ # 产物目录
|
||||
reports/ # 报告文件
|
||||
up_titles_report.md
|
||||
up_analysis_full_auto.md
|
||||
up_keep_follow_only.md
|
||||
uids/ # 取关UID结果
|
||||
unfollow_mids_list.txt
|
||||
unfollow_mids_list_1.txt
|
||||
unfollow_mids_list_2.txt
|
||||
...
|
||||
|
||||
analyze_up_content.py # 步骤1:抓取标题
|
||||
batch_ai_summary_from_report.py# 步骤2:分批分析
|
||||
extract_keep_follow_doc.py # 步骤3:保留关注报告
|
||||
extract_unfollow_list.py # 步骤4:取关UID
|
||||
run_pipeline.py # 一键流水线
|
||||
README_up_analysis.md
|
||||
```
|
||||
|
||||
## 先配置 API
|
||||
|
||||
编辑 [source/analyze_up_content.py](source/analyze_up_content.py) 顶部配置:
|
||||
|
||||
```python
|
||||
VOLCENGINE_API_KEY = "你的火山引擎API Key"
|
||||
VOLCENGINE_MODEL = "deepseek-v3-1-terminus"
|
||||
VOLCENGINE_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
|
||||
```
|
||||
|
||||
`batch_ai_summary_from_report.py` 会自动读取该配置。
|
||||
|
||||
## 一键推荐用法
|
||||
|
||||
在项目根目录运行:
|
||||
|
||||
```powershell
|
||||
python source/run_pipeline.py
|
||||
```
|
||||
|
||||
默认会完成:
|
||||
|
||||
1. 从 [source/resources/export_uids.json](source/resources/export_uids.json) 抓取标题到 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
||||
2. 分批分析到 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
|
||||
3. 生成保留关注报告 [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
|
||||
4. 生成取关UID [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt) 并按100拆分
|
||||
|
||||
## 常用参数
|
||||
|
||||
```powershell
|
||||
# 提升速度
|
||||
python source/run_pipeline.py --workers 8 --batch-size 30 --sleep-seconds 0
|
||||
|
||||
# 只先抓取前50个做试跑
|
||||
python source/run_pipeline.py --max-ups 50
|
||||
|
||||
# 仅处理带标签UP
|
||||
python source/run_pipeline.py --only-tag "准备取关"
|
||||
|
||||
# 跳过抓取(复用已有标题报告)
|
||||
python source/run_pipeline.py --skip-fetch
|
||||
|
||||
# 跳过分析(复用已有分析报告,仅生成产物)
|
||||
python source/run_pipeline.py --skip-analyze
|
||||
|
||||
# 修改UID拆分粒度
|
||||
python source/run_pipeline.py --split-size 200
|
||||
```
|
||||
|
||||
## 分步执行(可选)
|
||||
|
||||
### 步骤1:抓取标题
|
||||
|
||||
```powershell
|
||||
python source/analyze_up_content.py --skip-ai
|
||||
```
|
||||
|
||||
默认输出:
|
||||
- [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
||||
|
||||
### 步骤2:分批AI分析
|
||||
|
||||
```powershell
|
||||
python source/batch_ai_summary_from_report.py --run-all-batches
|
||||
# 小批量测试
|
||||
python source/batch_ai_summary_from_report.py
|
||||
|
||||
|
||||
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\18_12.md --force
|
||||
|
||||
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\19_06_all.md --force --run-all-batches
|
||||
```
|
||||
|
||||
默认输入/输出:
|
||||
- 输入 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
||||
- 输出 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
|
||||
|
||||
### 步骤3:生成保留关注报告
|
||||
|
||||
```powershell
|
||||
python source/extract_keep_follow_doc.py
|
||||
|
||||
python source/extract_keep_follow_doc.py --input source/19_06_all.md --output source/19_30_keep_follow.md
|
||||
```
|
||||
|
||||
输出:
|
||||
- [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
|
||||
|
||||
### 步骤4:生成取关UID
|
||||
|
||||
```powershell
|
||||
python source/extract_unfollow_list.py --format mid-only --split-size 100
|
||||
```
|
||||
|
||||
输出:
|
||||
- 主文件 [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt)
|
||||
- 拆分文件 [source/output/uids/unfollow_mids_list_1.txt](source/output/uids/unfollow_mids_list_1.txt) 等
|
||||
|
||||
## 结果解释
|
||||
|
||||
- `up_analysis_full_auto.md`:完整分析报告(含取关/保留)
|
||||
- `up_keep_follow_only.md`:仅保留关注UP的AI分析与分组建议
|
||||
- `unfollow_mids_list.txt`:可取关UID逗号分隔列表(可直接粘贴使用)
|
||||
|
||||
## 建议参数
|
||||
|
||||
- 稳定优先:`--workers 4 --max-retries 2 --request-timeout 60`
|
||||
- 速度优先:`--workers 8 --batch-size 30 --sleep-seconds 0`
|
||||
- 低风险试跑:`--max-ups 30` 先验证再全量
|
||||
|
||||
|
||||
|
||||
### 结果按首字母排序
|
||||
|
||||
```
|
||||
python sort_up_main.py
|
||||
```
|
||||
|
||||
|
||||
### 提取分组
|
||||
```
|
||||
python source/extract_group_info.py --input source/19_53_no_titles.md --output source/group_only.md
|
||||
```
|
||||
Reference in New Issue
Block a user