This commit is contained in:
parent
bd9a654c5f
commit
a6e7dc5e99
|
|
@ -0,0 +1,7 @@
|
||||||
|
{
|
||||||
|
"permissions": {
|
||||||
|
"allow": [
|
||||||
|
"Bash(python:*)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -1,5 +1,8 @@
|
||||||
# Wiki Sync Tool 配置文件示例
|
# Wiki Sync Tool 配置文件示例
|
||||||
# 复制此文件为 .env 并修改其中的值
|
# 复制此文件为 .env 并修改其中的值
|
||||||
|
|
||||||
# 你的 MediaWiki API 地址
|
# 英文版 Project Diablo 2 Wiki API 地址
|
||||||
WIKI_API_URL=https://your-wiki-site.com/api.php
|
WIKI_API_URL_EN=https://wiki.projectdiablo2.com/w/api.php
|
||||||
|
|
||||||
|
# 中文版 Project Diablo 2 Wiki API 地址
|
||||||
|
WIKI_API_URL_CN=https://wiki.projectdiablo2.cn/w/api.php
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
{
|
||||||
|
"python-envs.defaultEnvManager": "ms-python.python:system",
|
||||||
|
"python-envs.pythonProjects": []
|
||||||
|
}
|
||||||
94
README.md
94
README.md
|
|
@ -1,6 +1,6 @@
|
||||||
# Wiki Sync Tool
|
# Wiki Sync Tool - Enhanced Version
|
||||||
|
|
||||||
一个用于同步和跟踪 MediaWiki 网站变更的 Python 工具。该工具可以自动获取 wiki 页面的最新更改,生成易于阅读的 diff 文件,并保存页面的完整内容以供离线查阅。
|
一个用于同步和跟踪 MediaWiki 网站变更的 Python 工具,支持双语对比和精确的行号定位。
|
||||||
|
|
||||||
## 功能特点
|
## 功能特点
|
||||||
|
|
||||||
|
|
@ -10,6 +10,10 @@
|
||||||
- ⏰ 支持增量同步,只获取上次同步后的新变更
|
- ⏰ 支持增量同步,只获取上次同步后的新变更
|
||||||
- 🔍 支持按时间点或特定页面进行同步
|
- 🔍 支持按时间点或特定页面进行同步
|
||||||
- 📁 自动组织输出文件到时间戳目录
|
- 📁 自动组织输出文件到时间戳目录
|
||||||
|
- 🌐 **新增**:自动同步中文翻译版本
|
||||||
|
- 🎯 **新增**:精确的行号映射,点击英文行自动定位到中文对应行
|
||||||
|
- 📊 **新增**:生成精美的双语对比网页
|
||||||
|
- 🎨 **新增**:现代化的UI设计,支持同步滚动和高亮显示
|
||||||
|
|
||||||
## 安装
|
## 安装
|
||||||
|
|
||||||
|
|
@ -29,7 +33,11 @@
|
||||||
创建一个 `.env` 文件并配置你的 MediaWiki API 地址:
|
创建一个 `.env` 文件并配置你的 MediaWiki API 地址:
|
||||||
|
|
||||||
```env
|
```env
|
||||||
WIKI_API_URL=https://your-wiki-site.com/api.php
|
# 英文版 Project Diablo 2 Wiki API 地址
|
||||||
|
WIKI_API_URL_EN=https://wiki.projectdiablo2.com/w/api.php
|
||||||
|
|
||||||
|
# 中文版 Project Diablo 2 Wiki API 地址
|
||||||
|
WIKI_API_URL_CN=https://wiki.projectdiablo2.cn/w/api.php
|
||||||
```
|
```
|
||||||
|
|
||||||
或者复制提供的示例配置文件:
|
或者复制提供的示例配置文件:
|
||||||
|
|
@ -38,8 +46,6 @@ WIKI_API_URL=https://your-wiki-site.com/api.php
|
||||||
cp .env.example .env
|
cp .env.example .env
|
||||||
```
|
```
|
||||||
|
|
||||||
然后编辑 `.env` 文件中的 `WIKI_API_URL` 值。
|
|
||||||
|
|
||||||
## 使用方法
|
## 使用方法
|
||||||
|
|
||||||
### 基本全量同步
|
### 基本全量同步
|
||||||
|
|
@ -65,7 +71,7 @@ python sync.py --since 2025-11-28T00:00:00Z --run
|
||||||
只同步特定页面的最新更改:
|
只同步特定页面的最新更改:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python sync.py --title "Main Page" --run
|
python sync.py --title "Amazon Basin" --run
|
||||||
```
|
```
|
||||||
|
|
||||||
### 同步特定页面并更新时间戳
|
### 同步特定页面并更新时间戳
|
||||||
|
|
@ -73,7 +79,7 @@ python sync.py --title "Main Page" --run
|
||||||
同步特定页面并在完成后更新全局时间戳:
|
同步特定页面并在完成后更新全局时间戳:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python sync.py --title "Main Page" --update-timestamp --run
|
python sync.py --title "Amazon Basin" --update-timestamp --run
|
||||||
```
|
```
|
||||||
|
|
||||||
### 查看帮助
|
### 查看帮助
|
||||||
|
|
@ -86,41 +92,83 @@ python sync.py --help
|
||||||
|
|
||||||
每次运行都会在 `wiki_sync_output` 目录下创建一个以时间戳命名的子目录,包含生成的文件:
|
每次运行都会在 `wiki_sync_output` 目录下创建一个以时间戳命名的子目录,包含生成的文件:
|
||||||
|
|
||||||
- `页面标题-时间戳-revid.diff.html` - 页面变更的 HTML diff 文件
|
- `页面标题-时间戳-revid.diff.html` - MediaWiki原生HTML diff文件
|
||||||
- `页面标题-时间戳-revid.full.txt` - 页面的完整内容
|
- `页面标题-时间戳-revid.diff.txt` - 文本格式的diff(类似git diff)
|
||||||
|
- `页面标题-时间戳-revid.full.txt` - 页面的最新完整内容
|
||||||
|
- `页面标题-时间戳-revid.old.txt` - 页面的历史版本内容(如果有变更)
|
||||||
|
- `页面标题-时间戳-revid.cn.txt` - 中文翻译内容(如果找到)
|
||||||
|
- `页面标题-时间戳-revid.comparison.html` - **双语对比网页**(如果找到中文翻译)
|
||||||
|
|
||||||
|
### 双语对比网页特性
|
||||||
|
|
||||||
|
生成的双语对比网页具有以下高级功能:
|
||||||
|
|
||||||
|
1. **精确行号映射**:
|
||||||
|
- 英文diff中的每一行都标注了对应的中文行号
|
||||||
|
- 点击英文任意行,自动高亮并滚动到对应的中文行
|
||||||
|
|
||||||
|
2. **交互式体验**:
|
||||||
|
- 鼠标悬停时预览对应的中文行
|
||||||
|
- 点击时高亮显示对应关系
|
||||||
|
- 平滑滚动动画效果
|
||||||
|
|
||||||
|
3. **视觉设计**:
|
||||||
|
- 现代化的UI设计
|
||||||
|
- 标准的diff配色(绿色新增、红色删除、灰色未变更)
|
||||||
|
- 响应式布局,支持移动端查看
|
||||||
|
|
||||||
|
4. **同步滚动**:
|
||||||
|
- 左右两栏滚动位置自动同步
|
||||||
|
- 便于对比相同位置的内容
|
||||||
|
|
||||||
### Diff 文件示例
|
### Diff 文件示例
|
||||||
|
|
||||||
Diff 文件展示了页面的变更内容,具有以下特性:
|
文本diff格式示例:
|
||||||
|
|
||||||
|
```
|
||||||
|
--- old_version
|
||||||
|
+++ new_version
|
||||||
|
@@ -10,7 +10,7 @@
|
||||||
|
This is line 10
|
||||||
|
-This line will be removed
|
||||||
|
+This line will be added
|
||||||
|
This is line 12
|
||||||
|
```
|
||||||
|
|
||||||
|
HTML diff特性:
|
||||||
- 绿色背景表示新增内容
|
- 绿色背景表示新增内容
|
||||||
- 红色背景表示删除内容
|
- 红色背景表示删除内容
|
||||||
- 左侧彩色竖线标识变更类型
|
- 左侧彩色竖线标识变更类型
|
||||||
- +/- 标记清晰显示变更位置
|
- +/- 标记清晰显示变更位置
|
||||||
- 删除内容带有删除线效果
|
- 删除内容带有删除线效果
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## 技术细节
|
## 技术细节
|
||||||
|
|
||||||
### Diff 标记说明
|
### 行号解析机制
|
||||||
|
|
||||||
工具会对 MediaWiki 的原生 diff 输出进行处理:
|
工具使用自定义的diff解析器,能够精确提取:
|
||||||
|
- Hunk头部的行号范围信息
|
||||||
|
- 每一行变更对应的旧版本和新版本行号
|
||||||
|
- 增删改上下文行的准确位置
|
||||||
|
|
||||||
- 将 `<ins>` 和 `<del>` 标签转换为标准的 `<span>` 标签
|
### 中文页面搜索策略
|
||||||
- 将 `data-marker` 属性转换为实际的 +/- 符号
|
|
||||||
- 应用自定义 CSS 样式增强视觉效果
|
1. 首先尝试精确匹配页面标题
|
||||||
|
2. 如果失败,则进行模糊搜索
|
||||||
|
3. 支持标题中的空格和特殊字符处理
|
||||||
|
|
||||||
### 目录组织
|
### 目录组织
|
||||||
|
|
||||||
```
|
```
|
||||||
wiki_sync_output/
|
wiki_sync_output/
|
||||||
├── 20251203_152702/
|
├── 20251211_152702/
|
||||||
│ ├── Main_Page-20251203_152645-12345.diff.html
|
│ ├── Amazon_Basin-20251211_152645-12345.diff.html
|
||||||
│ ├── Main_Page-20251203_152645-12345.full.txt
|
│ ├── Amazon_Basin-20251211_152645-12345.diff.txt
|
||||||
│ ├── Another_Page-20251203_152650-12346.diff.html
|
│ ├── Amazon_Basin-20251211_152645-12345.full.txt
|
||||||
│ └── Another_Page-20251203_152650-12346.full.txt
|
│ ├── Amazon_Basin-20251211_152645-12345.old.txt
|
||||||
└── 20251203_153127/
|
│ ├── Amazon_Basin-20251211_152645-12345.cn.txt
|
||||||
|
│ └── Amazon_Basin-20251211_152645-12345.comparison.html
|
||||||
|
└── 20251211_153127/
|
||||||
└── ...
|
└── ...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
||||||
875
sync.py
875
sync.py
|
|
@ -1,12 +1,14 @@
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
"""
|
"""
|
||||||
MediaWiki 最近变更同步工具 - 绯红终版
|
MediaWiki 最近变更同步工具 - 增强版
|
||||||
支持:
|
支持:
|
||||||
1. 正常全量同步(无参数)
|
1. 正常全量同步(无参数)
|
||||||
2. 手动指定时间起点:--since 2025-11-28T00:00:00Z
|
2. 手动指定时间起点:--since 2025-11-28T00:00:00Z
|
||||||
3. 只同步单个页面:--title "页面名称"
|
3. 只同步单个页面:--title "页面名称"
|
||||||
4. 单个页面时可选更新全局时间戳:--update-timestamp
|
4. 单个页面时可选更新全局时间戳:--update-timestamp
|
||||||
5. 全部使用官方 action=compare 生成最完美的 diff
|
5. 获取历史版本并生成diff
|
||||||
|
6. 同步中文翻译版本
|
||||||
|
7. 生成双语对比网页
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
|
@ -15,10 +17,15 @@ from pathlib import Path
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
import requests
|
import requests
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
import difflib
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from urllib.parse import quote
|
||||||
|
|
||||||
# ==================== 配置区 ====================
|
# ==================== 配置区 ====================
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
WIKI_API_URL = os.getenv("WIKI_API_URL") # 从.env文件加载
|
WIKI_API_URL_EN = os.getenv("WIKI_API_URL_EN", "https://wiki.projectdiablo2.com/w/api.php")
|
||||||
|
WIKI_API_URL_CN = os.getenv("WIKI_API_URL_CN", "https://wiki.projectdiablo2.cn/w/api.php")
|
||||||
OUTPUT_DIR = Path("wiki_sync_output")
|
OUTPUT_DIR = Path("wiki_sync_output")
|
||||||
OUTPUT_DIR.mkdir(exist_ok=True)
|
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
|
@ -27,9 +34,14 @@ CURRENT_OUTPUT_DIR = None
|
||||||
|
|
||||||
LAST_TIMESTAMP_FILE = "last_sync_timestamp.txt"
|
LAST_TIMESTAMP_FILE = "last_sync_timestamp.txt"
|
||||||
|
|
||||||
SESSION = requests.Session()
|
SESSION_EN = requests.Session()
|
||||||
SESSION.headers.update({
|
SESSION_EN.headers.update({
|
||||||
"User-Agent": "WikiSyncTool/3.0 (your-email@example.com; MediaWiki Sync Bot)"
|
"User-Agent": "WikiSyncTool/4.0 (your-email@example.com; MediaWiki Sync Bot)"
|
||||||
|
})
|
||||||
|
|
||||||
|
SESSION_CN = requests.Session()
|
||||||
|
SESSION_CN.headers.update({
|
||||||
|
"User-Agent": "WikiSyncTool/4.0 (your-email@example.com; MediaWiki Sync Bot)"
|
||||||
})
|
})
|
||||||
# ================================================
|
# ================================================
|
||||||
|
|
||||||
|
|
@ -58,7 +70,7 @@ def get_recent_changes(since):
|
||||||
latest = {}
|
latest = {}
|
||||||
while True:
|
while True:
|
||||||
try:
|
try:
|
||||||
r = SESSION.get(WIKI_API_URL, params=params)
|
r = SESSION_EN.get(WIKI_API_URL_EN, params=params)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
response_data = r.json()
|
response_data = r.json()
|
||||||
if "error" in response_data:
|
if "error" in response_data:
|
||||||
|
|
@ -80,21 +92,19 @@ def get_old_revid(title, end_time):
|
||||||
"prop": "revisions",
|
"prop": "revisions",
|
||||||
"titles": title,
|
"titles": title,
|
||||||
"rvprop": "ids|timestamp",
|
"rvprop": "ids|timestamp",
|
||||||
"rvlimit": 1, # 获取2个版本,确保能找到不同的版本
|
"rvlimit": 1,
|
||||||
"rvdir": "older",
|
"rvdir": "older",
|
||||||
"rvstart": end_time,
|
"rvstart": end_time,
|
||||||
"format": "json"
|
"format": "json"
|
||||||
}
|
}
|
||||||
try:
|
try:
|
||||||
r = SESSION.get(WIKI_API_URL, params=params).json()
|
r = SESSION_EN.get(WIKI_API_URL_EN, params=params).json()
|
||||||
url = WIKI_API_URL + "?" + "&".join([f"{k}={v}" for k, v in params.items()])
|
|
||||||
print(f" 请求URL: {url}")
|
|
||||||
pages = r["query"]["pages"]
|
pages = r["query"]["pages"]
|
||||||
page = next(iter(pages.values()))
|
page = next(iter(pages.values()))
|
||||||
if "revisions" not in page:
|
if "revisions" not in page:
|
||||||
print(f" 页面 '{title}' 在指定时间前没有找到修订版本")
|
print(f" 页面 '{title}' 在指定时间前没有找到修订版本")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
revisions = page["revisions"]
|
revisions = page["revisions"]
|
||||||
if len(revisions) >= 1:
|
if len(revisions) >= 1:
|
||||||
return revisions[0]["revid"]
|
return revisions[0]["revid"]
|
||||||
|
|
@ -104,237 +114,694 @@ def get_old_revid(title, end_time):
|
||||||
print(f"获取旧版本ID时出错: {e}")
|
print(f"获取旧版本ID时出错: {e}")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def get_official_diff_and_content(title, from_revid, to_revid):
|
def get_page_content(wiki_url, session, title, revid=None):
|
||||||
# 获取官方 diff(HTML)
|
"""获取页面完整内容"""
|
||||||
diff_params = {
|
params = {
|
||||||
"action": "compare",
|
"action": "query",
|
||||||
"fromrev": from_revid or "",
|
"prop": "revisions",
|
||||||
"torev": to_revid,
|
"titles": title,
|
||||||
|
"rvprop": "content|timestamp|ids",
|
||||||
|
"rvslots": "main",
|
||||||
"format": "json"
|
"format": "json"
|
||||||
}
|
}
|
||||||
|
if revid:
|
||||||
print(f" 获取diff: fromrev={from_revid}, torev={to_revid}")
|
params["rvstartid"] = revid
|
||||||
|
params["rvendid"] = revid
|
||||||
try:
|
|
||||||
diff_resp = SESSION.get(WIKI_API_URL, params=diff_params).json()
|
try:
|
||||||
print(f" Diff响应: {list(diff_resp.keys())}")
|
r = session.get(wiki_url, params=params).json()
|
||||||
diff_html = diff_resp.get("compare", {}).get("*", "<p>无法获取 diff</p>")
|
pages = r["query"]["pages"]
|
||||||
print(f" Diff内容长度: {len(diff_html)} 字符")
|
page = next(iter(pages.values()))
|
||||||
|
|
||||||
# 获取最新完整内容
|
|
||||||
content_params = {
|
|
||||||
"action": "query",
|
|
||||||
"prop": "revisions",
|
|
||||||
"titles": title,
|
|
||||||
"rvprop": "content|timestamp",
|
|
||||||
"rvslots": "main",
|
|
||||||
"format": "json"
|
|
||||||
}
|
|
||||||
r = SESSION.get(WIKI_API_URL, params=content_params).json()
|
|
||||||
page = next(iter(r["query"]["pages"].values()))
|
|
||||||
if "revisions" not in page:
|
if "revisions" not in page:
|
||||||
return None, None, None
|
return None, None, None
|
||||||
|
|
||||||
rev = page["revisions"][0]
|
rev = page["revisions"][0]
|
||||||
full_text = rev["slots"]["main"]["*"]
|
content = rev["slots"]["main"]["*"]
|
||||||
ts = rev["timestamp"]
|
timestamp = rev["timestamp"]
|
||||||
return diff_html, full_text, ts
|
rev_id = rev["revid"]
|
||||||
|
|
||||||
|
return content, timestamp, rev_id
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"获取diff和内容时出错: {e}")
|
print(f"获取页面内容时出错: {e}")
|
||||||
return None, None, None
|
return None, None, None
|
||||||
|
|
||||||
def save_files(title, diff_html, full_text, timestamp, note="", revid=None):
|
def generate_text_diff(old_text, new_text):
|
||||||
|
"""生成类似git diff的文本diff"""
|
||||||
|
if not old_text:
|
||||||
|
return "新创建页面"
|
||||||
|
|
||||||
|
old_lines = old_text.splitlines(keepends=True)
|
||||||
|
new_lines = new_text.splitlines(keepends=True)
|
||||||
|
|
||||||
|
differ = difflib.unified_diff(
|
||||||
|
old_lines,
|
||||||
|
new_lines,
|
||||||
|
lineterm='\n'
|
||||||
|
)
|
||||||
|
|
||||||
|
return ''.join(differ)
|
||||||
|
|
||||||
|
def parse_diff_with_line_numbers(diff_text):
|
||||||
|
"""解析diff文本,提取详细的行号信息"""
|
||||||
|
if not diff_text or diff_text.startswith("新创建页面"):
|
||||||
|
return []
|
||||||
|
|
||||||
|
parsed_lines = []
|
||||||
|
current_old_line = 0
|
||||||
|
current_new_line = 0
|
||||||
|
in_hunk = False
|
||||||
|
|
||||||
|
for line in diff_text.splitlines():
|
||||||
|
if line.startswith('@@'):
|
||||||
|
# 解析hunk头部,格式如: @@ -start,count +start,count @@
|
||||||
|
import re
|
||||||
|
match = re.match(r'@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@', line)
|
||||||
|
if match:
|
||||||
|
old_start = int(match.group(1))
|
||||||
|
old_count = int(match.group(2)) if match.group(2) else 1
|
||||||
|
new_start = int(match.group(3))
|
||||||
|
new_count = int(match.group(4)) if match.group(4) else 1
|
||||||
|
|
||||||
|
current_old_line = old_start
|
||||||
|
current_new_line = new_start
|
||||||
|
in_hunk = True
|
||||||
|
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'hunk',
|
||||||
|
'content': line,
|
||||||
|
'old_start': old_start,
|
||||||
|
'old_count': old_count,
|
||||||
|
'new_start': new_start,
|
||||||
|
'new_count': new_count,
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'other',
|
||||||
|
'content': line,
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
elif line.startswith('---') or line.startswith('+++'):
|
||||||
|
# 文件头信息
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'header',
|
||||||
|
'content': line,
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
elif in_hunk:
|
||||||
|
if line.startswith('-'):
|
||||||
|
# 删除的行
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'removed',
|
||||||
|
'content': line[1:], # 去掉开头的 '-'
|
||||||
|
'old_line': current_old_line,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
current_old_line += 1
|
||||||
|
elif line.startswith('+'):
|
||||||
|
# 新增的行
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'added',
|
||||||
|
'content': line[1:], # 去掉开头的 '+'
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': current_new_line
|
||||||
|
})
|
||||||
|
current_new_line += 1
|
||||||
|
elif line.startswith(' '):
|
||||||
|
# 未变更的行
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'context',
|
||||||
|
'content': line[1:], # 去掉开头的 ' '
|
||||||
|
'old_line': current_old_line,
|
||||||
|
'new_line': current_new_line
|
||||||
|
})
|
||||||
|
current_old_line += 1
|
||||||
|
current_new_line += 1
|
||||||
|
else:
|
||||||
|
# 其他行(如空行)
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'other',
|
||||||
|
'content': line,
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
# 不在任何hunk中的行
|
||||||
|
parsed_lines.append({
|
||||||
|
'type': 'other',
|
||||||
|
'content': line,
|
||||||
|
'old_line': None,
|
||||||
|
'new_line': None
|
||||||
|
})
|
||||||
|
|
||||||
|
return parsed_lines
|
||||||
|
|
||||||
|
def search_chinese_page(title):
|
||||||
|
"""在中文wiki中搜索对应的页面"""
|
||||||
|
# 首先尝试精确匹配
|
||||||
|
params = {
|
||||||
|
"action": "query",
|
||||||
|
"list": "search",
|
||||||
|
"srsearch": f'"{title}"',
|
||||||
|
"srwhat": "title",
|
||||||
|
"srlimit": 5,
|
||||||
|
"format": "json"
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
r = SESSION_CN.get(WIKI_API_URL_CN, params=params).json()
|
||||||
|
search_results = r.get("query", {}).get("search", [])
|
||||||
|
|
||||||
|
if search_results:
|
||||||
|
# 返回第一个匹配的结果
|
||||||
|
return search_results[0]["title"]
|
||||||
|
|
||||||
|
# 如果精确匹配没有结果,尝试模糊搜索
|
||||||
|
params["srsearch"] = title.replace(" ", "%20")
|
||||||
|
r = SESSION_CN.get(WIKI_API_URL_CN, params=params).json()
|
||||||
|
search_results = r.get("query", {}).get("search", [])
|
||||||
|
|
||||||
|
if search_results:
|
||||||
|
return search_results[0]["title"]
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"搜索中文页面时出错: {e}")
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def create_diff_html(title, en_diff, en_old_lines, en_new_lines, cn_content=None):
|
||||||
|
"""创建双语对比的HTML页面 - 使用精确的行号映射"""
|
||||||
|
# 准备中文内容行
|
||||||
|
cn_lines = []
|
||||||
|
if cn_content:
|
||||||
|
cn_lines = cn_content.splitlines()
|
||||||
|
|
||||||
|
# 解析diff并获取行号信息
|
||||||
|
parsed_diff = parse_diff_with_line_numbers(en_diff) if en_diff else []
|
||||||
|
|
||||||
|
# 生成HTML
|
||||||
|
html = f'''<!DOCTYPE html>
|
||||||
|
<html lang="zh-CN">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Wiki Diff: {title}</title>
|
||||||
|
<style>
|
||||||
|
* {{
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
box-sizing: border-box;
|
||||||
|
}}
|
||||||
|
|
||||||
|
body {{
|
||||||
|
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
|
||||||
|
background-color: #f5f5f5;
|
||||||
|
line-height: 1.6;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.header {{
|
||||||
|
background-color: #fff;
|
||||||
|
padding: 20px;
|
||||||
|
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.header h1 {{
|
||||||
|
color: #333;
|
||||||
|
font-size: 24px;
|
||||||
|
margin-bottom: 10px;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.header .meta {{
|
||||||
|
color: #666;
|
||||||
|
font-size: 14px;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.container {{
|
||||||
|
display: flex;
|
||||||
|
max-width: 100%;
|
||||||
|
margin: 0 auto;
|
||||||
|
background-color: #fff;
|
||||||
|
min-height: calc(100vh - 100px);
|
||||||
|
}}
|
||||||
|
|
||||||
|
.column {{
|
||||||
|
flex: 1;
|
||||||
|
overflow: hidden;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.column-header {{
|
||||||
|
background-color: #e9ecef;
|
||||||
|
padding: 12px 20px;
|
||||||
|
font-weight: bold;
|
||||||
|
color: #495057;
|
||||||
|
border-bottom: 1px solid #dee2e6;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.diff-content {{
|
||||||
|
flex: 1;
|
||||||
|
overflow-y: auto;
|
||||||
|
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
||||||
|
font-size: 13px;
|
||||||
|
line-height: 1.4;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line {{
|
||||||
|
display: flex;
|
||||||
|
min-height: 20px;
|
||||||
|
position: relative;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line-number {{
|
||||||
|
width: 60px;
|
||||||
|
text-align: right;
|
||||||
|
padding: 0 10px;
|
||||||
|
background-color: #f8f9fa;
|
||||||
|
color: #6c757d;
|
||||||
|
border-right: 1px solid #dee2e6;
|
||||||
|
user-select: none;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.highlight {{
|
||||||
|
background-color: rgba(255, 235, 59, 0.3) !important;
|
||||||
|
animation: highlight 2s ease-in-out;
|
||||||
|
}}
|
||||||
|
|
||||||
|
@keyframes highlight {{
|
||||||
|
0% {{ background-color: rgba(255, 235, 59, 0.8); }}
|
||||||
|
100% {{ background-color: rgba(255, 235, 59, 0.3); }}
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line-content {{
|
||||||
|
flex: 1;
|
||||||
|
padding: 0 10px;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
word-break: break-word;
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Diff specific styles */
|
||||||
|
.line.diff-added {{
|
||||||
|
background-color: #e6ffec;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-added .line-content {{
|
||||||
|
background-color: #cdffd8;
|
||||||
|
border-left: 3px solid #28a745;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-removed {{
|
||||||
|
background-color: #ffeef0;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-removed .line-content {{
|
||||||
|
background-color: #fdb8c0;
|
||||||
|
border-left: 3px solid #dc3545;
|
||||||
|
text-decoration: line-through;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-context {{
|
||||||
|
background-color: #ffffff;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-context .line-content {{
|
||||||
|
background-color: #ffffff;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-hunk {{
|
||||||
|
background-color: #f8f9fa;
|
||||||
|
color: #6c757d;
|
||||||
|
font-style: italic;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-hunk .line-content {{
|
||||||
|
background-color: #f1f3f4;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-header {{
|
||||||
|
background-color: #e9ecef;
|
||||||
|
color: #495057;
|
||||||
|
font-style: italic;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line.diff-header .line-content {{
|
||||||
|
background-color: #e9ecef;
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Separator between columns */
|
||||||
|
.separator {{
|
||||||
|
width: 1px;
|
||||||
|
background-color: #dee2e6;
|
||||||
|
box-shadow: 0 0 5px rgba(0,0,0,0.1);
|
||||||
|
position: relative;
|
||||||
|
z-index: 10;
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Scrollbar styling */
|
||||||
|
.diff-content::-webkit-scrollbar {{
|
||||||
|
width: 8px;
|
||||||
|
height: 8px;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.diff-content::-webkit-scrollbar-track {{
|
||||||
|
background: #f1f1f1;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.diff-content::-webkit-scrollbar-thumb {{
|
||||||
|
background: #888;
|
||||||
|
border-radius: 4px;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.diff-content::-webkit-scrollbar-thumb:hover {{
|
||||||
|
background: #555;
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Responsive design */
|
||||||
|
@media (max-width: 768px) {{
|
||||||
|
.container {{
|
||||||
|
flex-direction: column;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.separator {{
|
||||||
|
width: 100%;
|
||||||
|
height: 1px;
|
||||||
|
}}
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Special styling for new page */
|
||||||
|
.new-page-notice {{
|
||||||
|
background-color: #d4edda;
|
||||||
|
color: #155724;
|
||||||
|
padding: 15px 20px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
border-left: 4px solid #28a745;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.no-translation {{
|
||||||
|
background-color: #fff3cd;
|
||||||
|
color: #856404;
|
||||||
|
padding: 15px 20px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
border-left: 4px solid #ffc107;
|
||||||
|
}}
|
||||||
|
|
||||||
|
/* Line linking styles */
|
||||||
|
.line[data-cn-line] {{
|
||||||
|
cursor: pointer;
|
||||||
|
}}
|
||||||
|
|
||||||
|
.line:hover {{
|
||||||
|
background-color: rgba(0, 123, 255, 0.05);
|
||||||
|
}}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="header">
|
||||||
|
<h1>{title}</h1>
|
||||||
|
<div class="meta">
|
||||||
|
<span>英文Wiki: wiki.projectdiablo2.com</span>
|
||||||
|
{f' | 中文Wiki: wiki.projectdiablo2.cn' if cn_content else ''}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="container">
|
||||||
|
<div class="column">
|
||||||
|
<div class="column-header">English Diff</div>
|
||||||
|
<div class="diff-content" id="en-diff">
|
||||||
|
'''
|
||||||
|
|
||||||
|
# 生成英文diff内容
|
||||||
|
if parsed_diff:
|
||||||
|
for item in parsed_diff:
|
||||||
|
if item['type'] == 'hunk':
|
||||||
|
html += f'<div class="line diff-hunk"><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
elif item['type'] == 'header':
|
||||||
|
html += f'<div class="line diff-header"><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
elif item['type'] == 'added':
|
||||||
|
cn_line_attr = f'data-cn-line="{item["new_line"]}"' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
|
||||||
|
cn_title = f'中文第{item["new_line"]}行' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
|
||||||
|
html += f'<div class="line diff-added" {cn_line_attr} title="{cn_title}"><span class="line-number">{item["new_line"] or ""}</span><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
elif item['type'] == 'removed':
|
||||||
|
html += f'<div class="line diff-removed" title="已删除"><span class="line-number">{item["old_line"] or ""}</span><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
elif item['type'] == 'context':
|
||||||
|
cn_line_attr = f'data-cn-line="{item["new_line"]}"' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
|
||||||
|
cn_title = f'中文第{item["new_line"]}行' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
|
||||||
|
html += f'<div class="line diff-context" {cn_line_attr} title="{cn_title}"><span class="line-number">{item["new_line"]}</span><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
else:
|
||||||
|
html += f'<div class="line"><span class="line-content">{item["content"]}</span></div>'
|
||||||
|
else:
|
||||||
|
# 新页面或无diff
|
||||||
|
if en_diff and en_diff.startswith("新创建页面"):
|
||||||
|
html += '<div class="new-page-notice">新创建页面</div>'
|
||||||
|
|
||||||
|
# 显示完整内容(新页面或无diff时)
|
||||||
|
for i, line in enumerate(en_new_lines or [], 1):
|
||||||
|
cn_line_attr = f'data-cn-line="{i}"' if cn_lines and i <= len(cn_lines) else ''
|
||||||
|
cn_title = f'中文第{i}行' if cn_lines and i <= len(cn_lines) else ''
|
||||||
|
html += f'<div class="line diff-context" {cn_line_attr} title="{cn_title}"><span class="line-number">{i}</span><span class="line-content">{line}</span></div>'
|
||||||
|
|
||||||
|
html += '''
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="separator"></div>
|
||||||
|
|
||||||
|
<div class="column">
|
||||||
|
<div class="column-header">中文翻译</div>
|
||||||
|
<div class="diff-content" id="cn-content">
|
||||||
|
'''
|
||||||
|
|
||||||
|
# 添加中文内容
|
||||||
|
if cn_content:
|
||||||
|
html += '<div id="cn-lines">'
|
||||||
|
for i, line in enumerate(cn_lines, 1):
|
||||||
|
html += f'<div class="line diff-context" id="cn-line-{i}"><span class="line-number">{i}</span><span class="line-content">{line}</span></div>'
|
||||||
|
html += '</div>'
|
||||||
|
else:
|
||||||
|
html += '<div class="no-translation">未找到对应的中文翻译页面</div>'
|
||||||
|
|
||||||
|
html += '''
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
// 同步滚动功能
|
||||||
|
const enDiff = document.querySelector('#en-diff');
|
||||||
|
const cnContent = document.querySelector('#cn-content');
|
||||||
|
const cnLines = {};
|
||||||
|
|
||||||
|
// 构建中文行的位置映射
|
||||||
|
if (document.getElementById('cn-lines')) {{
|
||||||
|
document.querySelectorAll('#cn-lines .line').forEach(line => {{
|
||||||
|
const lineNum = line.querySelector('.line-number').textContent;
|
||||||
|
if (lineNum) {{
|
||||||
|
cnLines[lineNum] = line.offsetTop;
|
||||||
|
}}
|
||||||
|
}});
|
||||||
|
}}
|
||||||
|
|
||||||
|
// 同步滚动
|
||||||
|
if (enDiff && cnContent) {{
|
||||||
|
enDiff.addEventListener('scroll', () => {{
|
||||||
|
cnContent.scrollTop = enDiff.scrollTop;
|
||||||
|
}});
|
||||||
|
|
||||||
|
cnContent.addEventListener('scroll', () => {{
|
||||||
|
enDiff.scrollTop = cnContent.scrollTop;
|
||||||
|
}});
|
||||||
|
}}
|
||||||
|
|
||||||
|
// 点击英文行时,高亮对应的中文行
|
||||||
|
document.querySelectorAll('[data-cn-line]').forEach(enLine => {{
|
||||||
|
enLine.addEventListener('click', () => {{
|
||||||
|
const cnLineNum = enLine.getAttribute('data-cn-line');
|
||||||
|
if (cnLineNum) {{
|
||||||
|
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
|
||||||
|
if (cnLine) {{
|
||||||
|
// 移除所有高亮
|
||||||
|
document.querySelectorAll('.line.highlight').forEach(line => {{
|
||||||
|
line.classList.remove('highlight');
|
||||||
|
}});
|
||||||
|
|
||||||
|
// 高亮英文行和中文行
|
||||||
|
enLine.classList.add('highlight');
|
||||||
|
cnLine.classList.add('highlight');
|
||||||
|
|
||||||
|
// 滚动到中文行的位置
|
||||||
|
cnLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||||
|
}}
|
||||||
|
}}
|
||||||
|
}});
|
||||||
|
|
||||||
|
// 鼠标悬停时显示预览
|
||||||
|
enLine.addEventListener('mouseenter', () => {{
|
||||||
|
const cnLineNum = enLine.getAttribute('data-cn-line');
|
||||||
|
if (cnLineNum) {{
|
||||||
|
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
|
||||||
|
if (cnLine) {{
|
||||||
|
enLine.style.backgroundColor = 'rgba(0, 123, 255, 0.1)';
|
||||||
|
cnLine.style.backgroundColor = 'rgba(0, 123, 255, 0.1)';
|
||||||
|
}}
|
||||||
|
}}
|
||||||
|
}});
|
||||||
|
|
||||||
|
enLine.addEventListener('mouseleave', () => {{
|
||||||
|
if (!enLine.classList.contains('highlight')) {{
|
||||||
|
enLine.style.backgroundColor = '';
|
||||||
|
}}
|
||||||
|
const cnLineNum = enLine.getAttribute('data-cn-line');
|
||||||
|
if (cnLineNum) {{
|
||||||
|
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
|
||||||
|
if (cnLine && !cnLine.classList.contains('highlight')) {{
|
||||||
|
cnLine.style.backgroundColor = '';
|
||||||
|
}}
|
||||||
|
}}
|
||||||
|
}});
|
||||||
|
}});
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>'''
|
||||||
|
|
||||||
|
return html
|
||||||
|
|
||||||
|
def save_files(title, diff_html, diff_text, full_text, timestamp, note="", revid=None, cn_content=None, old_full_text=None):
|
||||||
global CURRENT_OUTPUT_DIR
|
global CURRENT_OUTPUT_DIR
|
||||||
|
|
||||||
# 确保本次执行的输出目录已经创建
|
# 确保本次执行的输出目录已经创建
|
||||||
if CURRENT_OUTPUT_DIR is None:
|
if CURRENT_OUTPUT_DIR is None:
|
||||||
current_time_str = datetime.now().strftime("%Y%m%d_%H%M%S")
|
current_time_str = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||||
CURRENT_OUTPUT_DIR = OUTPUT_DIR / current_time_str
|
CURRENT_OUTPUT_DIR = OUTPUT_DIR / current_time_str
|
||||||
CURRENT_OUTPUT_DIR.mkdir(exist_ok=True)
|
CURRENT_OUTPUT_DIR.mkdir(exist_ok=True)
|
||||||
print(f"创建本次执行的输出目录: {CURRENT_OUTPUT_DIR}")
|
print(f"创建本次执行的输出目录: {CURRENT_OUTPUT_DIR}")
|
||||||
|
|
||||||
safe_title = "".join(c if c.isalnum() or c in " -_." else "_" for c in title)
|
safe_title = "".join(c if c.isalnum() or c in " -_." else "_" for c in title)
|
||||||
time_str = timestamp[:19].replace("-", "").replace(":", "").replace("T", "_")
|
time_str = timestamp[:19].replace("-", "").replace(":", "").replace("T", "_")
|
||||||
# 简化文件名格式,只包含标题、时间和revid
|
|
||||||
base_filename = f"{safe_title}-{time_str}-{revid}" if revid else f"{safe_title}-{time_str}"
|
base_filename = f"{safe_title}-{time_str}-{revid}" if revid else f"{safe_title}-{time_str}"
|
||||||
|
|
||||||
|
# 保存各种文件
|
||||||
|
files_to_save = []
|
||||||
|
|
||||||
|
# 1. 标准MediaWiki diff HTML
|
||||||
diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.html"
|
diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.html"
|
||||||
|
if diff_html:
|
||||||
|
files_to_save.append((diff_file, diff_html))
|
||||||
|
|
||||||
|
# 2. 文本格式的diff
|
||||||
|
text_diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.txt"
|
||||||
|
if diff_text:
|
||||||
|
files_to_save.append((text_diff_file, diff_text))
|
||||||
|
|
||||||
|
# 3. 最新完整内容
|
||||||
full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.full.txt"
|
full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.full.txt"
|
||||||
|
if full_text:
|
||||||
|
files_to_save.append((full_file, full_text))
|
||||||
|
|
||||||
# 美化 HTML diff,使用类似git diff的配色方案
|
# 4. 历史版本内容(如果存在)
|
||||||
# 先处理diff_html,将ins/del标签替换为span标签
|
if old_full_text:
|
||||||
processed_diff_html = diff_html.replace('<ins class="diffchange', '<span class="diffchange added"').replace('</ins>', '</span>').replace('<del class="diffchange', '<span class="diffchange deleted"').replace('</del>', '</span>')
|
old_full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.old.txt"
|
||||||
# 再处理diff标记,将data-marker属性替换为实际的span元素
|
files_to_save.append((old_full_file, old_full_text))
|
||||||
processed_diff_html = processed_diff_html.replace('<td class="diff-marker" data-marker="−"></td>', '<td class="diff-marker"><span class="minus-marker">−</span></td>').replace('<td class="diff-marker" data-marker="+"></td>', '<td class="diff-marker"><span class="plus-marker">+</span></td>')
|
|
||||||
|
|
||||||
html_wrapper = f'''<!DOCTYPE html>
|
|
||||||
<html><head><meta charset="utf-8"><title>Diff: {title}</title>
|
|
||||||
<style>
|
|
||||||
body {{
|
|
||||||
font-family: system-ui, sans-serif;
|
|
||||||
margin: 20px;
|
|
||||||
}}
|
|
||||||
table.diff {{
|
|
||||||
border-collapse: collapse;
|
|
||||||
font-family: monospace;
|
|
||||||
width: 100%;
|
|
||||||
table-layout: fixed;
|
|
||||||
}}
|
|
||||||
table.diff td {{
|
|
||||||
padding: 0 5px;
|
|
||||||
vertical-align: top;
|
|
||||||
white-space: pre-wrap;
|
|
||||||
word-break: break-all;
|
|
||||||
font-size: 14px;
|
|
||||||
line-height: 1.4;
|
|
||||||
}}
|
|
||||||
table.diff col.diff-marker {{
|
|
||||||
width: 20px;
|
|
||||||
text-align: right;
|
|
||||||
background-color: #fafafa;
|
|
||||||
}}
|
|
||||||
table.diff col.diff-content {{
|
|
||||||
width: auto;
|
|
||||||
}}
|
|
||||||
table.diff col.diff-addedline,
|
|
||||||
table.diff col.diff-deletedline {{
|
|
||||||
width: 50%;
|
|
||||||
}}
|
|
||||||
.diff-addedline {{
|
|
||||||
background-color: #dfd;
|
|
||||||
}}
|
|
||||||
.diff-addedline .diffchange {{
|
|
||||||
background-color: #9e9;
|
|
||||||
color: #000;
|
|
||||||
}}
|
|
||||||
.diff-deletedline {{
|
|
||||||
background-color: #fee8e8;
|
|
||||||
}}
|
|
||||||
.diff-deletedline .diffchange {{
|
|
||||||
background-color: #faa;
|
|
||||||
color: #000;
|
|
||||||
}}
|
|
||||||
.diff-context {{
|
|
||||||
background-color: #fafafa;
|
|
||||||
}}
|
|
||||||
.diff-context td {{
|
|
||||||
color: #777;
|
|
||||||
}}
|
|
||||||
.diff-marker {{
|
|
||||||
font-weight: bold;
|
|
||||||
text-align: right;
|
|
||||||
padding: 0 4px;
|
|
||||||
}}
|
|
||||||
.diff-lineno {{
|
|
||||||
background-color: #f0f0f0;
|
|
||||||
text-align: right;
|
|
||||||
padding: 0 4px;
|
|
||||||
}}
|
|
||||||
.diff-addedline .diff-marker {{
|
|
||||||
color: #080;
|
|
||||||
}}
|
|
||||||
.diff-deletedline .diff-marker {{
|
|
||||||
color: #800;
|
|
||||||
}}
|
|
||||||
|
|
||||||
/* 新增的diff标记样式 */
|
# 5. 中文翻译内容(如果存在)
|
||||||
.plus-marker {{
|
if cn_content:
|
||||||
color: #080;
|
cn_file = CURRENT_OUTPUT_DIR / f"{base_filename}.cn.txt"
|
||||||
font-weight: bold;
|
files_to_save.append((cn_file, cn_content))
|
||||||
}}
|
|
||||||
.minus-marker {{
|
|
||||||
color: #800;
|
|
||||||
font-weight: bold;
|
|
||||||
}}
|
|
||||||
|
|
||||||
/* 确保变更行有明显的视觉区分 */
|
# 6. 双语对比HTML页面
|
||||||
.diff-addedline div,
|
if cn_content:
|
||||||
.diff-deletedline div {{
|
# 为文本diff准备行
|
||||||
display: inline-block;
|
en_new_lines = full_text.splitlines() if full_text else []
|
||||||
width: 100%;
|
en_old_lines = old_full_text.splitlines() if old_full_text else []
|
||||||
}}
|
|
||||||
|
|
||||||
/* 增加一些额外的视觉提示 */
|
# 创建双语对比页面
|
||||||
.diff-addedline {{
|
comparison_html = create_diff_html(title, diff_text, en_old_lines, en_new_lines, cn_content)
|
||||||
border-left: 4px solid #080;
|
comparison_file = CURRENT_OUTPUT_DIR / f"{base_filename}.comparison.html"
|
||||||
}}
|
files_to_save.append((comparison_file, comparison_html))
|
||||||
.diff-deletedline {{
|
print(f" → 已保存: {comparison_file.relative_to(OUTPUT_DIR)} (双语对比页面)")
|
||||||
border-left: 4px solid #800;
|
|
||||||
}}
|
|
||||||
.diff-context {{
|
|
||||||
border-left: 4px solid #ccc;
|
|
||||||
}}
|
|
||||||
|
|
||||||
/* 替换ins/del标签为span标签的样式 */
|
# 写入所有文件
|
||||||
.diffchange.added {{
|
for file_path, content in files_to_save:
|
||||||
background-color: #9e9;
|
try:
|
||||||
color: #000;
|
with open(file_path, "w", encoding="utf-8") as f:
|
||||||
font-weight: bold;
|
f.write(content)
|
||||||
text-decoration: none;
|
print(f" → 已保存: {file_path.relative_to(OUTPUT_DIR)}")
|
||||||
}}
|
except Exception as e:
|
||||||
.diffchange.deleted {{
|
print(f" → 保存文件 {file_path} 时出错: {e}")
|
||||||
background-color: #faa;
|
|
||||||
color: #000;
|
|
||||||
font-weight: bold;
|
|
||||||
text-decoration: line-through;
|
|
||||||
}}
|
|
||||||
</style></head><body>
|
|
||||||
<h2>{title}</h2>
|
|
||||||
<p>修改时间: {timestamp}</p>
|
|
||||||
{processed_diff_html}
|
|
||||||
</body></html>'''
|
|
||||||
|
|
||||||
try:
|
|
||||||
with open(diff_file, "w", encoding="utf-8") as f:
|
|
||||||
f.write(html_wrapper)
|
|
||||||
with open(full_file, "w", encoding="utf-8") as f:
|
|
||||||
f.write(full_text)
|
|
||||||
|
|
||||||
print(f" → 已保存: {diff_file.relative_to(OUTPUT_DIR)}")
|
|
||||||
print(f" → 已保存: {full_file.relative_to(OUTPUT_DIR)}")
|
|
||||||
except Exception as e:
|
|
||||||
print(f" → 保存文件时出错: {e}")
|
|
||||||
|
|
||||||
print(f" → 完整路径: {diff_file}")
|
|
||||||
print(f" → 完整路径: {full_file}")
|
|
||||||
|
|
||||||
def process_single_page(title, since_time, update_timestamp=False):
|
def process_single_page(title, since_time, update_timestamp=False):
|
||||||
"""只处理单个页面"""
|
"""只处理单个页面"""
|
||||||
print(f"正在单独处理页面:{title}")
|
print(f"正在单独处理页面:{title}")
|
||||||
|
|
||||||
# 获取当前最新 revid
|
# 获取当前最新 revid
|
||||||
params = {
|
|
||||||
"action": "query",
|
|
||||||
"prop": "revisions",
|
|
||||||
"titles": title,
|
|
||||||
"rvprop": "ids|timestamp",
|
|
||||||
"rvlimit": 1,
|
|
||||||
"format": "json"
|
|
||||||
}
|
|
||||||
try:
|
try:
|
||||||
r = SESSION.get(WIKI_API_URL, params=params).json()
|
latest_content, latest_ts, latest_revid = get_page_content(WIKI_API_URL_EN, SESSION_EN, title)
|
||||||
page = next(iter(r["query"]["pages"].values()))
|
if latest_content is None:
|
||||||
if "revisions" not in page:
|
|
||||||
print("页面不存在或被删除")
|
print("页面不存在或被删除")
|
||||||
return None
|
return None
|
||||||
latest_revid = page["revisions"][0]["revid"]
|
|
||||||
latest_ts = page["revisions"][0]["timestamp"]
|
|
||||||
|
|
||||||
# 获取旧 revid
|
# 获取旧 revid
|
||||||
old_revid = get_old_revid(title, since_time)
|
old_revid = get_old_revid(title, since_time)
|
||||||
|
|
||||||
diff_html, full_text, new_ts = get_official_diff_and_content(title, old_revid, latest_revid)
|
# 初始化变量
|
||||||
if diff_html is not None and full_text is not None:
|
diff_html = None
|
||||||
# 移除旧的note标记,使用更简洁的命名方式
|
diff_text = None
|
||||||
if not old_revid:
|
old_content = None
|
||||||
diff_html = "<p style='color:green;font-weight:bold'>新创建页面(无历史版本)</p>"
|
cn_content = None
|
||||||
save_files(title, diff_html, full_text, new_ts, "", latest_revid)
|
|
||||||
|
if old_revid:
|
||||||
|
# 获取历史版本内容
|
||||||
|
old_content, old_ts, _ = get_page_content(WIKI_API_URL_EN, SESSION_EN, title, old_revid)
|
||||||
|
|
||||||
|
if old_content is not None:
|
||||||
|
# 生成文本diff
|
||||||
|
diff_text = generate_text_diff(old_content, latest_content)
|
||||||
|
print(f" 生成了文本diff ({len(diff_text)} 字符)")
|
||||||
|
else:
|
||||||
|
print(f" 无法获取历史版本内容")
|
||||||
else:
|
else:
|
||||||
print(f" 警告: 未能获取完整的差异或内容数据")
|
# 新页面
|
||||||
|
print(" 这是新创建的页面")
|
||||||
|
|
||||||
|
# 搜索对应的中文页面
|
||||||
|
print(" 搜索中文翻译...")
|
||||||
|
cn_title = search_chinese_page(title)
|
||||||
|
if cn_title:
|
||||||
|
print(f" 找到中文页面: {cn_title}")
|
||||||
|
cn_content, cn_ts, cn_revid = get_page_content(WIKI_API_URL_CN, SESSION_CN, cn_title)
|
||||||
|
if cn_content:
|
||||||
|
print(f" 获取中文内容成功 ({len(cn_content)} 字符)")
|
||||||
|
else:
|
||||||
|
print(" 无法获取中文页面内容")
|
||||||
|
else:
|
||||||
|
print(" 未找到对应的中文翻译页面")
|
||||||
|
|
||||||
|
# 获取官方diff(可选)
|
||||||
|
if old_revid:
|
||||||
|
diff_params = {
|
||||||
|
"action": "compare",
|
||||||
|
"fromrev": old_revid,
|
||||||
|
"torev": latest_revid,
|
||||||
|
"format": "json"
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
diff_resp = SESSION_EN.get(WIKI_API_URL_EN, params=diff_params).json()
|
||||||
|
diff_html = diff_resp.get("compare", {}).get("*", "")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" 获取官方HTML diff时出错: {e}")
|
||||||
|
|
||||||
|
# 保存所有文件
|
||||||
|
save_files(title, diff_html, diff_text, latest_content, latest_ts, "", latest_revid, cn_content, old_content)
|
||||||
|
|
||||||
if update_timestamp:
|
if update_timestamp:
|
||||||
save_last_timestamp(latest_ts)
|
save_last_timestamp(latest_ts)
|
||||||
print(f"已更新全局时间戳 → {latest_ts}")
|
print(f"已更新全局时间戳 → {latest_ts}")
|
||||||
|
|
||||||
return latest_ts
|
return latest_ts
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"处理页面 '{title}' 时出错: {e}")
|
print(f"处理页面 '{title}' 时出错: {e}")
|
||||||
|
|
@ -353,7 +820,7 @@ def process_all_pages_since(since_time):
|
||||||
print(f"\n处理:{title}")
|
print(f"\n处理:{title}")
|
||||||
# 复用单页处理逻辑
|
# 复用单页处理逻辑
|
||||||
page_latest_ts = process_single_page(title, since_time)
|
page_latest_ts = process_single_page(title, since_time)
|
||||||
|
|
||||||
if page_latest_ts and page_latest_ts > latest_global_ts:
|
if page_latest_ts and page_latest_ts > latest_global_ts:
|
||||||
latest_global_ts = page_latest_ts
|
latest_global_ts = page_latest_ts
|
||||||
|
|
||||||
|
|
@ -362,14 +829,14 @@ def process_all_pages_since(since_time):
|
||||||
print(f"文件保存在:{CURRENT_OUTPUT_DIR.resolve() if CURRENT_OUTPUT_DIR else OUTPUT_DIR.resolve()}")
|
print(f"文件保存在:{CURRENT_OUTPUT_DIR.resolve() if CURRENT_OUTPUT_DIR else OUTPUT_DIR.resolve()}")
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
parser = argparse.ArgumentParser(description="MediaWiki 同步工具 - 支持全量/单页/自定义时间")
|
parser = argparse.ArgumentParser(description="MediaWiki 同步工具 - 增强版支持双语对比")
|
||||||
parser.add_argument("--since", type=str, help="强制从指定时间开始同步,格式如 2025-11-28T00:00:00Z")
|
parser.add_argument("--since", type=str, help="强制从指定时间开始同步,格式如 2025-11-28T00:00:00Z")
|
||||||
parser.add_argument("--title", type=str, help="只同步指定的单个页面标题")
|
parser.add_argument("--title", type=str, help="只同步指定的单个页面标题")
|
||||||
parser.add_argument("--update-timestamp", action="store_true",
|
parser.add_argument("--update-timestamp", action="store_true",
|
||||||
help="在单页模式下,完成后仍然更新全局 last_sync_timestamp.txt")
|
help="在单页模式下,完成后仍然更新全局 last_sync_timestamp.txt")
|
||||||
parser.add_argument("--run", action="store_true",
|
parser.add_argument("--run", action="store_true",
|
||||||
help="执行同步操作(必须提供此参数才能真正执行同步)")
|
help="执行同步操作(必须提供此参数才能真正执行同步)")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
# 如果没有提供 --run 参数,则显示帮助信息并退出
|
# 如果没有提供 --run 参数,则显示帮助信息并退出
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,8 @@
|
||||||
|
根据README,sync.py中会获取wiki.projectdiablo2.com的变更并拉下原文的全量文件。现在需要增加以下功能:
|
||||||
|
|
||||||
|
1. 获取英文wiki的最新页面full(已实现),获取其上个版本的全量full(用上一步中的old_revid拉取).
|
||||||
|
2. 如果该网页是新增(现有逻辑),则只保存最新文件full即可。
|
||||||
|
3. 如果该wiki是变更,则用历史版本的full文件,和最新的文件进行diff,得到diff文件。此处用模仿git diff的Python或库进行。得到diff文件。
|
||||||
|
|
||||||
|
4. 对于该页面标题,去另一网站(wiki.projectdiablo2.cn)搜索并拉下原文,这是同步的翻译后的中文网站。需要注意的在两个网站的页面ID不会一致,但页面title是保持一致的,同时绝大部分页面经过了翻译。
|
||||||
|
5. 保存一个网页,生成diff文件的网页展示,页面设计美观精致,使用现代化的CSS/JS。将页面竖向分成两栏,左边为英文源码的两个版本DIFF,右侧为同样行号的中文源码。 注意行号是保持一致的。绝大多数页面的中文的行号是完全一致的可以放心对比。diff的展示同样要有标准的红色、绿色等.
|
||||||
Loading…
Reference in New Issue