删除md文件
This commit is contained in:
16075
source/17_46.md
16075
source/17_46.md
File diff suppressed because it is too large
Load Diff
14919
source/18_04.md
14919
source/18_04.md
File diff suppressed because it is too large
Load Diff
14873
source/18_12.md
14873
source/18_12.md
File diff suppressed because it is too large
Load Diff
14871
source/19_02.md
14871
source/19_02.md
File diff suppressed because it is too large
Load Diff
16455
source/19_06_all.md
16455
source/19_06_all.md
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,159 +0,0 @@
|
|||||||
# B站关注清理工具(优化版)
|
|
||||||
|
|
||||||
本项目保留并聚焦一条可用功能链:
|
|
||||||
|
|
||||||
1. 抓取视频标题
|
|
||||||
2. 分批AI分析
|
|
||||||
3. 生成取关UID(支持按100拆分)
|
|
||||||
4. 生成保留关注报告
|
|
||||||
|
|
||||||
## 目录结构
|
|
||||||
|
|
||||||
```text
|
|
||||||
source/
|
|
||||||
resources/ # 资源文件
|
|
||||||
export_uids.json
|
|
||||||
export_uids.txt
|
|
||||||
|
|
||||||
output/ # 产物目录
|
|
||||||
reports/ # 报告文件
|
|
||||||
up_titles_report.md
|
|
||||||
up_analysis_full_auto.md
|
|
||||||
up_keep_follow_only.md
|
|
||||||
uids/ # 取关UID结果
|
|
||||||
unfollow_mids_list.txt
|
|
||||||
unfollow_mids_list_1.txt
|
|
||||||
unfollow_mids_list_2.txt
|
|
||||||
...
|
|
||||||
|
|
||||||
analyze_up_content.py # 步骤1:抓取标题
|
|
||||||
batch_ai_summary_from_report.py# 步骤2:分批分析
|
|
||||||
extract_keep_follow_doc.py # 步骤3:保留关注报告
|
|
||||||
extract_unfollow_list.py # 步骤4:取关UID
|
|
||||||
run_pipeline.py # 一键流水线
|
|
||||||
README_up_analysis.md
|
|
||||||
```
|
|
||||||
|
|
||||||
## 先配置 API
|
|
||||||
|
|
||||||
编辑 [source/analyze_up_content.py](source/analyze_up_content.py) 顶部配置:
|
|
||||||
|
|
||||||
```python
|
|
||||||
VOLCENGINE_API_KEY = "你的火山引擎API Key"
|
|
||||||
VOLCENGINE_MODEL = "deepseek-v3-1-terminus"
|
|
||||||
VOLCENGINE_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
|
|
||||||
```
|
|
||||||
|
|
||||||
`batch_ai_summary_from_report.py` 会自动读取该配置。
|
|
||||||
|
|
||||||
## 一键推荐用法
|
|
||||||
|
|
||||||
在项目根目录运行:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python source/run_pipeline.py
|
|
||||||
```
|
|
||||||
|
|
||||||
默认会完成:
|
|
||||||
|
|
||||||
1. 从 [source/resources/export_uids.json](source/resources/export_uids.json) 抓取标题到 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
|
||||||
2. 分批分析到 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
|
|
||||||
3. 生成保留关注报告 [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
|
|
||||||
4. 生成取关UID [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt) 并按100拆分
|
|
||||||
|
|
||||||
## 常用参数
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
# 提升速度
|
|
||||||
python source/run_pipeline.py --workers 8 --batch-size 30 --sleep-seconds 0
|
|
||||||
|
|
||||||
# 只先抓取前50个做试跑
|
|
||||||
python source/run_pipeline.py --max-ups 50
|
|
||||||
|
|
||||||
# 仅处理带标签UP
|
|
||||||
python source/run_pipeline.py --only-tag "准备取关"
|
|
||||||
|
|
||||||
# 跳过抓取(复用已有标题报告)
|
|
||||||
python source/run_pipeline.py --skip-fetch
|
|
||||||
|
|
||||||
# 跳过分析(复用已有分析报告,仅生成产物)
|
|
||||||
python source/run_pipeline.py --skip-analyze
|
|
||||||
|
|
||||||
# 修改UID拆分粒度
|
|
||||||
python source/run_pipeline.py --split-size 200
|
|
||||||
```
|
|
||||||
|
|
||||||
## 分步执行(可选)
|
|
||||||
|
|
||||||
### 步骤1:抓取标题
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python source/analyze_up_content.py --skip-ai
|
|
||||||
```
|
|
||||||
|
|
||||||
默认输出:
|
|
||||||
- [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
|
||||||
|
|
||||||
### 步骤2:分批AI分析
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python source/batch_ai_summary_from_report.py --run-all-batches
|
|
||||||
# 小批量测试
|
|
||||||
python source/batch_ai_summary_from_report.py
|
|
||||||
|
|
||||||
|
|
||||||
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\18_12.md --force
|
|
||||||
|
|
||||||
python source/batch_ai_summary_from_report.py --input source\output\reports\up_titles_report.md --output source\19_06_all.md --force --run-all-batches
|
|
||||||
```
|
|
||||||
|
|
||||||
默认输入/输出:
|
|
||||||
- 输入 [source/output/reports/up_titles_report.md](source/output/reports/up_titles_report.md)
|
|
||||||
- 输出 [source/output/reports/up_analysis_full_auto.md](source/output/reports/up_analysis_full_auto.md)
|
|
||||||
|
|
||||||
### 步骤3:生成保留关注报告
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python source/extract_keep_follow_doc.py
|
|
||||||
|
|
||||||
python source/extract_keep_follow_doc.py --input source/19_06_all.md --output source/19_30_keep_follow.md
|
|
||||||
```
|
|
||||||
|
|
||||||
输出:
|
|
||||||
- [source/output/reports/up_keep_follow_only.md](source/output/reports/up_keep_follow_only.md)
|
|
||||||
|
|
||||||
### 步骤4:生成取关UID
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python source/extract_unfollow_list.py --format mid-only --split-size 100
|
|
||||||
```
|
|
||||||
|
|
||||||
输出:
|
|
||||||
- 主文件 [source/output/uids/unfollow_mids_list.txt](source/output/uids/unfollow_mids_list.txt)
|
|
||||||
- 拆分文件 [source/output/uids/unfollow_mids_list_1.txt](source/output/uids/unfollow_mids_list_1.txt) 等
|
|
||||||
|
|
||||||
## 结果解释
|
|
||||||
|
|
||||||
- `up_analysis_full_auto.md`:完整分析报告(含取关/保留)
|
|
||||||
- `up_keep_follow_only.md`:仅保留关注UP的AI分析与分组建议
|
|
||||||
- `unfollow_mids_list.txt`:可取关UID逗号分隔列表(可直接粘贴使用)
|
|
||||||
|
|
||||||
## 建议参数
|
|
||||||
|
|
||||||
- 稳定优先:`--workers 4 --max-retries 2 --request-timeout 60`
|
|
||||||
- 速度优先:`--workers 8 --batch-size 30 --sleep-seconds 0`
|
|
||||||
- 低风险试跑:`--max-ups 30` 先验证再全量
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### 结果按首字母排序
|
|
||||||
|
|
||||||
```
|
|
||||||
python sort_up_main.py
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### 提取分组
|
|
||||||
```
|
|
||||||
python source/extract_group_info.py --input source/19_53_no_titles.md --output source/group_only.md
|
|
||||||
```
|
|
||||||
@@ -1,2 +0,0 @@
|
|||||||
不要让agent直接处理内容
|
|
||||||
让agent编写脚本,自行测试,不然额度会不够
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,54 +0,0 @@
|
|||||||
# 保留关注UP主分析与分组建议
|
|
||||||
|
|
||||||
- 生成时间: 2026-04-26 17:32:08
|
|
||||||
- 来源文件: aaa.md
|
|
||||||
- 条目数: 5
|
|
||||||
|
|
||||||
## 2. 温竣岩 (mid: 2026173074)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
温竣岩内容聚焦深度社会议题,涵盖存储芯片、职业教育、财税医改等硬核分析,兼具学术著作解读与产业观察,主题严肃专业,信息密度高,适合知识型观众。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 标题多涉及改革、产业、学术等深度话题,命中硬核知识规则词,内容优质但非每日必读级别,故保留而非取关。
|
|
||||||
|
|
||||||
## 3. 考研英语马天艺老师 (mid: 1357612844)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
马天艺老师专注于考研英语和四六级备考,内容涵盖词汇记忆、语法学习、常见误区解析等实用技巧,旨在帮助考生高效提升英语能力。视频标题多为具体学习方法指导,如单词记忆技巧、语法入门路径等,适合备考学生系统学习。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: B_技能学习保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容高度匹配技能学习主题,命中规则词如'单词背了'、'语法零基础'、'学习路径'等,提供实用备考技巧,对目标用户有价值,故建议保留。
|
|
||||||
|
|
||||||
## 4. 赛博食录 (mid: 1937308559)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
赛博食录专注于食品科技与饮食文化,涵盖3D打印食物、咖啡价格、帝王蟹市场等硬核分析,内容兼具专业性与趣味性,但缺乏编程或技能类干货,定位偏向知识科普。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容专业稳定,涉及食品科技与历史,符合硬核知识标准;但非每日必读或编程类,故归入A组。
|
|
||||||
|
|
||||||
## 5. 黑毛羊驼 (mid: 475443398)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
黑毛羊驼专注于古生物、人类演化、神秘生物等科普内容,标题专业性强,涵盖恐龙、哺乳动物、人类起源等硬核知识,内容稳定且深度较高。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容专业度高,涉及古生物和演化等硬核科普,稳定性强,符合A组对知识深度的要求,值得保留。
|
|
||||||
|
|
||||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,65 +0,0 @@
|
|||||||
# 保留关注UP主分析与分组建议
|
|
||||||
|
|
||||||
- 生成时间: 2026-04-26 17:31:39
|
|
||||||
- 来源文件: aaa.md
|
|
||||||
- 条目数: 5
|
|
||||||
|
|
||||||
## 1. 小约翰可汗 (mid: 23947287)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
小约翰可汗近期视频以历史事件、国际组织与硬核人物为主,如伊朗革命系列、神奇组织与硬核狠人系列,内容偏向知识性与稳定性,但缺乏编程或技能类干货,整体属于中高专业度。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容聚焦历史与国际议题,专业度较高且更新稳定,符合硬核知识保留标准,但未达核心必读级别,故建议保留。
|
|
||||||
|
|
||||||
## 2. 温竣岩 (mid: 2026173074)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
温竣岩内容聚焦深度社会议题,涵盖存储芯片、职业教育、财税医改等硬核分析,兼具学术著作解读与产业观察,主题严肃专业,信息密度高,适合知识型观众。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 标题多涉及改革、产业、学术等深度话题,命中硬核知识规则词,内容优质但非每日必读级别,故保留而非取关。
|
|
||||||
|
|
||||||
## 3. 考研英语马天艺老师 (mid: 1357612844)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
马天艺老师专注于考研英语和四六级备考,内容涵盖词汇记忆、语法学习、常见误区解析等实用技巧,旨在帮助考生高效提升英语能力。视频标题多为具体学习方法指导,如单词记忆技巧、语法入门路径等,适合备考学生系统学习。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: B_技能学习保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容高度匹配技能学习主题,命中规则词如'单词背了'、'语法零基础'、'学习路径'等,提供实用备考技巧,对目标用户有价值,故建议保留。
|
|
||||||
|
|
||||||
## 4. 赛博食录 (mid: 1937308559)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
赛博食录专注于食品科技与饮食文化,涵盖3D打印食物、咖啡价格、帝王蟹市场等硬核分析,内容兼具专业性与趣味性,但缺乏编程或技能类干货,定位偏向知识科普。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容专业稳定,涉及食品科技与历史,符合硬核知识标准;但非每日必读或编程类,故归入A组。
|
|
||||||
|
|
||||||
## 5. 黑毛羊驼 (mid: 475443398)
|
|
||||||
|
|
||||||
### AI分析
|
|
||||||
|
|
||||||
黑毛羊驼专注于古生物、人类演化、神秘生物等科普内容,标题专业性强,涵盖恐龙、哺乳动物、人类起源等硬核知识,内容稳定且深度较高。
|
|
||||||
|
|
||||||
### 分组建议
|
|
||||||
|
|
||||||
- 预设分组: A_硬核知识保留
|
|
||||||
- 建议动作: 保留关注
|
|
||||||
- 判断依据: 内容专业度高,涉及古生物和演化等硬核科普,稳定性强,符合A组对知识深度的要求,值得保留。
|
|
||||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user