This commit is contained in:
wdjwxh 2025-12-11 15:46:18 +08:00
parent bd9a654c5f
commit a6e7dc5e99
6 changed files with 766 additions and 229 deletions

View File

@ -0,0 +1,7 @@
{
"permissions": {
"allow": [
"Bash(python:*)"
]
}
}

View File

@ -1,5 +1,8 @@
# Wiki Sync Tool 配置文件示例 # Wiki Sync Tool 配置文件示例
# 复制此文件为 .env 并修改其中的值 # 复制此文件为 .env 并修改其中的值
# 你的 MediaWiki API 地址 # 英文版 Project Diablo 2 Wiki API 地址
WIKI_API_URL=https://your-wiki-site.com/api.php WIKI_API_URL_EN=https://wiki.projectdiablo2.com/w/api.php
# 中文版 Project Diablo 2 Wiki API 地址
WIKI_API_URL_CN=https://wiki.projectdiablo2.cn/w/api.php

4
.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,4 @@
{
"python-envs.defaultEnvManager": "ms-python.python:system",
"python-envs.pythonProjects": []
}

View File

@ -1,6 +1,6 @@
# Wiki Sync Tool # Wiki Sync Tool - Enhanced Version
一个用于同步和跟踪 MediaWiki 网站变更的 Python 工具。该工具可以自动获取 wiki 页面的最新更改,生成易于阅读的 diff 文件,并保存页面的完整内容以供离线查阅 一个用于同步和跟踪 MediaWiki 网站变更的 Python 工具,支持双语对比和精确的行号定位
## 功能特点 ## 功能特点
@ -10,6 +10,10 @@
- ⏰ 支持增量同步,只获取上次同步后的新变更 - ⏰ 支持增量同步,只获取上次同步后的新变更
- 🔍 支持按时间点或特定页面进行同步 - 🔍 支持按时间点或特定页面进行同步
- 📁 自动组织输出文件到时间戳目录 - 📁 自动组织输出文件到时间戳目录
- 🌐 **新增**:自动同步中文翻译版本
- 🎯 **新增**:精确的行号映射,点击英文行自动定位到中文对应行
- 📊 **新增**:生成精美的双语对比网页
- 🎨 **新增**现代化的UI设计支持同步滚动和高亮显示
## 安装 ## 安装
@ -29,7 +33,11 @@
创建一个 `.env` 文件并配置你的 MediaWiki API 地址: 创建一个 `.env` 文件并配置你的 MediaWiki API 地址:
```env ```env
WIKI_API_URL=https://your-wiki-site.com/api.php # 英文版 Project Diablo 2 Wiki API 地址
WIKI_API_URL_EN=https://wiki.projectdiablo2.com/w/api.php
# 中文版 Project Diablo 2 Wiki API 地址
WIKI_API_URL_CN=https://wiki.projectdiablo2.cn/w/api.php
``` ```
或者复制提供的示例配置文件: 或者复制提供的示例配置文件:
@ -38,8 +46,6 @@ WIKI_API_URL=https://your-wiki-site.com/api.php
cp .env.example .env cp .env.example .env
``` ```
然后编辑 `.env` 文件中的 `WIKI_API_URL` 值。
## 使用方法 ## 使用方法
### 基本全量同步 ### 基本全量同步
@ -65,7 +71,7 @@ python sync.py --since 2025-11-28T00:00:00Z --run
只同步特定页面的最新更改: 只同步特定页面的最新更改:
```bash ```bash
python sync.py --title "Main Page" --run python sync.py --title "Amazon Basin" --run
``` ```
### 同步特定页面并更新时间戳 ### 同步特定页面并更新时间戳
@ -73,7 +79,7 @@ python sync.py --title "Main Page" --run
同步特定页面并在完成后更新全局时间戳: 同步特定页面并在完成后更新全局时间戳:
```bash ```bash
python sync.py --title "Main Page" --update-timestamp --run python sync.py --title "Amazon Basin" --update-timestamp --run
``` ```
### 查看帮助 ### 查看帮助
@ -86,41 +92,83 @@ python sync.py --help
每次运行都会在 `wiki_sync_output` 目录下创建一个以时间戳命名的子目录,包含生成的文件: 每次运行都会在 `wiki_sync_output` 目录下创建一个以时间戳命名的子目录,包含生成的文件:
- `页面标题-时间戳-revid.diff.html` - 页面变更的 HTML diff 文件 - `页面标题-时间戳-revid.diff.html` - MediaWiki原生HTML diff文件
- `页面标题-时间戳-revid.full.txt` - 页面的完整内容 - `页面标题-时间戳-revid.diff.txt` - 文本格式的diff类似git diff
- `页面标题-时间戳-revid.full.txt` - 页面的最新完整内容
- `页面标题-时间戳-revid.old.txt` - 页面的历史版本内容(如果有变更)
- `页面标题-时间戳-revid.cn.txt` - 中文翻译内容(如果找到)
- `页面标题-时间戳-revid.comparison.html` - **双语对比网页**(如果找到中文翻译)
### 双语对比网页特性
生成的双语对比网页具有以下高级功能:
1. **精确行号映射**
- 英文diff中的每一行都标注了对应的中文行号
- 点击英文任意行,自动高亮并滚动到对应的中文行
2. **交互式体验**
- 鼠标悬停时预览对应的中文行
- 点击时高亮显示对应关系
- 平滑滚动动画效果
3. **视觉设计**
- 现代化的UI设计
- 标准的diff配色绿色新增、红色删除、灰色未变更
- 响应式布局,支持移动端查看
4. **同步滚动**
- 左右两栏滚动位置自动同步
- 便于对比相同位置的内容
### Diff 文件示例 ### Diff 文件示例
Diff 文件展示了页面的变更内容,具有以下特性: 文本diff格式示例
```
--- old_version
+++ new_version
@@ -10,7 +10,7 @@
This is line 10
-This line will be removed
+This line will be added
This is line 12
```
HTML diff特性
- 绿色背景表示新增内容 - 绿色背景表示新增内容
- 红色背景表示删除内容 - 红色背景表示删除内容
- 左侧彩色竖线标识变更类型 - 左侧彩色竖线标识变更类型
- +/- 标记清晰显示变更位置 - +/- 标记清晰显示变更位置
- 删除内容带有删除线效果 - 删除内容带有删除线效果
![Diff 示例截图](example-diff.png)
## 技术细节 ## 技术细节
### Diff 标记说明 ### 行号解析机制
工具会对 MediaWiki 的原生 diff 输出进行处理: 工具使用自定义的diff解析器能够精确提取
- Hunk头部的行号范围信息
- 每一行变更对应的旧版本和新版本行号
- 增删改上下文行的准确位置
- 将 `<ins>``<del>` 标签转换为标准的 `<span>` 标签 ### 中文页面搜索策略
- 将 `data-marker` 属性转换为实际的 +/- 符号
- 应用自定义 CSS 样式增强视觉效果 1. 首先尝试精确匹配页面标题
2. 如果失败,则进行模糊搜索
3. 支持标题中的空格和特殊字符处理
### 目录组织 ### 目录组织
``` ```
wiki_sync_output/ wiki_sync_output/
├── 20251203_152702/ ├── 20251211_152702/
│ ├── Main_Page-20251203_152645-12345.diff.html │ ├── Amazon_Basin-20251211_152645-12345.diff.html
│ ├── Main_Page-20251203_152645-12345.full.txt │ ├── Amazon_Basin-20251211_152645-12345.diff.txt
│ ├── Another_Page-20251203_152650-12346.diff.html │ ├── Amazon_Basin-20251211_152645-12345.full.txt
│ └── Another_Page-20251203_152650-12346.full.txt │ ├── Amazon_Basin-20251211_152645-12345.old.txt
└── 20251203_153127/ │ ├── Amazon_Basin-20251211_152645-12345.cn.txt
│ └── Amazon_Basin-20251211_152645-12345.comparison.html
└── 20251211_153127/
└── ... └── ...
``` ```

847
sync.py
View File

@ -1,12 +1,14 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
""" """
MediaWiki 最近变更同步工具 - 绯红终 MediaWiki 最近变更同步工具 - 增强
支持 支持
1. 正常全量同步无参数 1. 正常全量同步无参数
2. 手动指定时间起点--since 2025-11-28T00:00:00Z 2. 手动指定时间起点--since 2025-11-28T00:00:00Z
3. 只同步单个页面--title "页面名称" 3. 只同步单个页面--title "页面名称"
4. 单个页面时可选更新全局时间戳--update-timestamp 4. 单个页面时可选更新全局时间戳--update-timestamp
5. 全部使用官方 action=compare 生成最完美的 diff 5. 获取历史版本并生成diff
6. 同步中文翻译版本
7. 生成双语对比网页
""" """
import os import os
@ -15,10 +17,15 @@ from pathlib import Path
from datetime import datetime from datetime import datetime
import requests import requests
from dotenv import load_dotenv from dotenv import load_dotenv
import difflib
import json
import re
from urllib.parse import quote
# ==================== 配置区 ==================== # ==================== 配置区 ====================
load_dotenv() load_dotenv()
WIKI_API_URL = os.getenv("WIKI_API_URL") # 从.env文件加载 WIKI_API_URL_EN = os.getenv("WIKI_API_URL_EN", "https://wiki.projectdiablo2.com/w/api.php")
WIKI_API_URL_CN = os.getenv("WIKI_API_URL_CN", "https://wiki.projectdiablo2.cn/w/api.php")
OUTPUT_DIR = Path("wiki_sync_output") OUTPUT_DIR = Path("wiki_sync_output")
OUTPUT_DIR.mkdir(exist_ok=True) OUTPUT_DIR.mkdir(exist_ok=True)
@ -27,9 +34,14 @@ CURRENT_OUTPUT_DIR = None
LAST_TIMESTAMP_FILE = "last_sync_timestamp.txt" LAST_TIMESTAMP_FILE = "last_sync_timestamp.txt"
SESSION = requests.Session() SESSION_EN = requests.Session()
SESSION.headers.update({ SESSION_EN.headers.update({
"User-Agent": "WikiSyncTool/3.0 (your-email@example.com; MediaWiki Sync Bot)" "User-Agent": "WikiSyncTool/4.0 (your-email@example.com; MediaWiki Sync Bot)"
})
SESSION_CN = requests.Session()
SESSION_CN.headers.update({
"User-Agent": "WikiSyncTool/4.0 (your-email@example.com; MediaWiki Sync Bot)"
}) })
# ================================================ # ================================================
@ -58,7 +70,7 @@ def get_recent_changes(since):
latest = {} latest = {}
while True: while True:
try: try:
r = SESSION.get(WIKI_API_URL, params=params) r = SESSION_EN.get(WIKI_API_URL_EN, params=params)
r.raise_for_status() r.raise_for_status()
response_data = r.json() response_data = r.json()
if "error" in response_data: if "error" in response_data:
@ -80,15 +92,13 @@ def get_old_revid(title, end_time):
"prop": "revisions", "prop": "revisions",
"titles": title, "titles": title,
"rvprop": "ids|timestamp", "rvprop": "ids|timestamp",
"rvlimit": 1, # 获取2个版本确保能找到不同的版本 "rvlimit": 1,
"rvdir": "older", "rvdir": "older",
"rvstart": end_time, "rvstart": end_time,
"format": "json" "format": "json"
} }
try: try:
r = SESSION.get(WIKI_API_URL, params=params).json() r = SESSION_EN.get(WIKI_API_URL_EN, params=params).json()
url = WIKI_API_URL + "?" + "&".join([f"{k}={v}" for k, v in params.items()])
print(f" 请求URL: {url}")
pages = r["query"]["pages"] pages = r["query"]["pages"]
page = next(iter(pages.values())) page = next(iter(pages.values()))
if "revisions" not in page: if "revisions" not in page:
@ -104,45 +114,564 @@ def get_old_revid(title, end_time):
print(f"获取旧版本ID时出错: {e}") print(f"获取旧版本ID时出错: {e}")
return None return None
def get_official_diff_and_content(title, from_revid, to_revid): def get_page_content(wiki_url, session, title, revid=None):
# 获取官方 diffHTML """获取页面完整内容"""
diff_params = { params = {
"action": "compare",
"fromrev": from_revid or "",
"torev": to_revid,
"format": "json"
}
print(f" 获取diff: fromrev={from_revid}, torev={to_revid}")
try:
diff_resp = SESSION.get(WIKI_API_URL, params=diff_params).json()
print(f" Diff响应: {list(diff_resp.keys())}")
diff_html = diff_resp.get("compare", {}).get("*", "<p>无法获取 diff</p>")
print(f" Diff内容长度: {len(diff_html)} 字符")
# 获取最新完整内容
content_params = {
"action": "query", "action": "query",
"prop": "revisions", "prop": "revisions",
"titles": title, "titles": title,
"rvprop": "content|timestamp", "rvprop": "content|timestamp|ids",
"rvslots": "main", "rvslots": "main",
"format": "json" "format": "json"
} }
r = SESSION.get(WIKI_API_URL, params=content_params).json() if revid:
page = next(iter(r["query"]["pages"].values())) params["rvstartid"] = revid
params["rvendid"] = revid
try:
r = session.get(wiki_url, params=params).json()
pages = r["query"]["pages"]
page = next(iter(pages.values()))
if "revisions" not in page: if "revisions" not in page:
return None, None, None return None, None, None
rev = page["revisions"][0] rev = page["revisions"][0]
full_text = rev["slots"]["main"]["*"] content = rev["slots"]["main"]["*"]
ts = rev["timestamp"] timestamp = rev["timestamp"]
return diff_html, full_text, ts rev_id = rev["revid"]
return content, timestamp, rev_id
except Exception as e: except Exception as e:
print(f"获取diff和内容时出错: {e}") print(f"获取页面内容时出错: {e}")
return None, None, None return None, None, None
def save_files(title, diff_html, full_text, timestamp, note="", revid=None): def generate_text_diff(old_text, new_text):
"""生成类似git diff的文本diff"""
if not old_text:
return "新创建页面"
old_lines = old_text.splitlines(keepends=True)
new_lines = new_text.splitlines(keepends=True)
differ = difflib.unified_diff(
old_lines,
new_lines,
lineterm='\n'
)
return ''.join(differ)
def parse_diff_with_line_numbers(diff_text):
"""解析diff文本提取详细的行号信息"""
if not diff_text or diff_text.startswith("新创建页面"):
return []
parsed_lines = []
current_old_line = 0
current_new_line = 0
in_hunk = False
for line in diff_text.splitlines():
if line.startswith('@@'):
# 解析hunk头部格式如: @@ -start,count +start,count @@
import re
match = re.match(r'@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@', line)
if match:
old_start = int(match.group(1))
old_count = int(match.group(2)) if match.group(2) else 1
new_start = int(match.group(3))
new_count = int(match.group(4)) if match.group(4) else 1
current_old_line = old_start
current_new_line = new_start
in_hunk = True
parsed_lines.append({
'type': 'hunk',
'content': line,
'old_start': old_start,
'old_count': old_count,
'new_start': new_start,
'new_count': new_count,
'old_line': None,
'new_line': None
})
else:
parsed_lines.append({
'type': 'other',
'content': line,
'old_line': None,
'new_line': None
})
elif line.startswith('---') or line.startswith('+++'):
# 文件头信息
parsed_lines.append({
'type': 'header',
'content': line,
'old_line': None,
'new_line': None
})
elif in_hunk:
if line.startswith('-'):
# 删除的行
parsed_lines.append({
'type': 'removed',
'content': line[1:], # 去掉开头的 '-'
'old_line': current_old_line,
'new_line': None
})
current_old_line += 1
elif line.startswith('+'):
# 新增的行
parsed_lines.append({
'type': 'added',
'content': line[1:], # 去掉开头的 '+'
'old_line': None,
'new_line': current_new_line
})
current_new_line += 1
elif line.startswith(' '):
# 未变更的行
parsed_lines.append({
'type': 'context',
'content': line[1:], # 去掉开头的 ' '
'old_line': current_old_line,
'new_line': current_new_line
})
current_old_line += 1
current_new_line += 1
else:
# 其他行(如空行)
parsed_lines.append({
'type': 'other',
'content': line,
'old_line': None,
'new_line': None
})
else:
# 不在任何hunk中的行
parsed_lines.append({
'type': 'other',
'content': line,
'old_line': None,
'new_line': None
})
return parsed_lines
def search_chinese_page(title):
"""在中文wiki中搜索对应的页面"""
# 首先尝试精确匹配
params = {
"action": "query",
"list": "search",
"srsearch": f'"{title}"',
"srwhat": "title",
"srlimit": 5,
"format": "json"
}
try:
r = SESSION_CN.get(WIKI_API_URL_CN, params=params).json()
search_results = r.get("query", {}).get("search", [])
if search_results:
# 返回第一个匹配的结果
return search_results[0]["title"]
# 如果精确匹配没有结果,尝试模糊搜索
params["srsearch"] = title.replace(" ", "%20")
r = SESSION_CN.get(WIKI_API_URL_CN, params=params).json()
search_results = r.get("query", {}).get("search", [])
if search_results:
return search_results[0]["title"]
except Exception as e:
print(f"搜索中文页面时出错: {e}")
return None
def create_diff_html(title, en_diff, en_old_lines, en_new_lines, cn_content=None):
"""创建双语对比的HTML页面 - 使用精确的行号映射"""
# 准备中文内容行
cn_lines = []
if cn_content:
cn_lines = cn_content.splitlines()
# 解析diff并获取行号信息
parsed_diff = parse_diff_with_line_numbers(en_diff) if en_diff else []
# 生成HTML
html = f'''<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Wiki Diff: {title}</title>
<style>
* {{
margin: 0;
padding: 0;
box-sizing: border-box;
}}
body {{
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
background-color: #f5f5f5;
line-height: 1.6;
}}
.header {{
background-color: #fff;
padding: 20px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
margin-bottom: 20px;
}}
.header h1 {{
color: #333;
font-size: 24px;
margin-bottom: 10px;
}}
.header .meta {{
color: #666;
font-size: 14px;
}}
.container {{
display: flex;
max-width: 100%;
margin: 0 auto;
background-color: #fff;
min-height: calc(100vh - 100px);
}}
.column {{
flex: 1;
overflow: hidden;
display: flex;
flex-direction: column;
}}
.column-header {{
background-color: #e9ecef;
padding: 12px 20px;
font-weight: bold;
color: #495057;
border-bottom: 1px solid #dee2e6;
}}
.diff-content {{
flex: 1;
overflow-y: auto;
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
font-size: 13px;
line-height: 1.4;
}}
.line {{
display: flex;
min-height: 20px;
position: relative;
}}
.line-number {{
width: 60px;
text-align: right;
padding: 0 10px;
background-color: #f8f9fa;
color: #6c757d;
border-right: 1px solid #dee2e6;
user-select: none;
flex-shrink: 0;
}}
.line.highlight {{
background-color: rgba(255, 235, 59, 0.3) !important;
animation: highlight 2s ease-in-out;
}}
@keyframes highlight {{
0% {{ background-color: rgba(255, 235, 59, 0.8); }}
100% {{ background-color: rgba(255, 235, 59, 0.3); }}
}}
.line-content {{
flex: 1;
padding: 0 10px;
white-space: pre-wrap;
word-break: break-word;
}}
/* Diff specific styles */
.line.diff-added {{
background-color: #e6ffec;
}}
.line.diff-added .line-content {{
background-color: #cdffd8;
border-left: 3px solid #28a745;
}}
.line.diff-removed {{
background-color: #ffeef0;
}}
.line.diff-removed .line-content {{
background-color: #fdb8c0;
border-left: 3px solid #dc3545;
text-decoration: line-through;
}}
.line.diff-context {{
background-color: #ffffff;
}}
.line.diff-context .line-content {{
background-color: #ffffff;
}}
.line.diff-hunk {{
background-color: #f8f9fa;
color: #6c757d;
font-style: italic;
}}
.line.diff-hunk .line-content {{
background-color: #f1f3f4;
}}
.line.diff-header {{
background-color: #e9ecef;
color: #495057;
font-style: italic;
}}
.line.diff-header .line-content {{
background-color: #e9ecef;
}}
/* Separator between columns */
.separator {{
width: 1px;
background-color: #dee2e6;
box-shadow: 0 0 5px rgba(0,0,0,0.1);
position: relative;
z-index: 10;
}}
/* Scrollbar styling */
.diff-content::-webkit-scrollbar {{
width: 8px;
height: 8px;
}}
.diff-content::-webkit-scrollbar-track {{
background: #f1f1f1;
}}
.diff-content::-webkit-scrollbar-thumb {{
background: #888;
border-radius: 4px;
}}
.diff-content::-webkit-scrollbar-thumb:hover {{
background: #555;
}}
/* Responsive design */
@media (max-width: 768px) {{
.container {{
flex-direction: column;
}}
.separator {{
width: 100%;
height: 1px;
}}
}}
/* Special styling for new page */
.new-page-notice {{
background-color: #d4edda;
color: #155724;
padding: 15px 20px;
margin-bottom: 20px;
border-left: 4px solid #28a745;
}}
.no-translation {{
background-color: #fff3cd;
color: #856404;
padding: 15px 20px;
margin-bottom: 20px;
border-left: 4px solid #ffc107;
}}
/* Line linking styles */
.line[data-cn-line] {{
cursor: pointer;
}}
.line:hover {{
background-color: rgba(0, 123, 255, 0.05);
}}
</style>
</head>
<body>
<div class="header">
<h1>{title}</h1>
<div class="meta">
<span>英文Wiki: wiki.projectdiablo2.com</span>
{f' | 中文Wiki: wiki.projectdiablo2.cn' if cn_content else ''}
</div>
</div>
<div class="container">
<div class="column">
<div class="column-header">English Diff</div>
<div class="diff-content" id="en-diff">
'''
# 生成英文diff内容
if parsed_diff:
for item in parsed_diff:
if item['type'] == 'hunk':
html += f'<div class="line diff-hunk"><span class="line-content">{item["content"]}</span></div>'
elif item['type'] == 'header':
html += f'<div class="line diff-header"><span class="line-content">{item["content"]}</span></div>'
elif item['type'] == 'added':
cn_line_attr = f'data-cn-line="{item["new_line"]}"' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
cn_title = f'中文第{item["new_line"]}' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
html += f'<div class="line diff-added" {cn_line_attr} title="{cn_title}"><span class="line-number">{item["new_line"] or ""}</span><span class="line-content">{item["content"]}</span></div>'
elif item['type'] == 'removed':
html += f'<div class="line diff-removed" title="已删除"><span class="line-number">{item["old_line"] or ""}</span><span class="line-content">{item["content"]}</span></div>'
elif item['type'] == 'context':
cn_line_attr = f'data-cn-line="{item["new_line"]}"' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
cn_title = f'中文第{item["new_line"]}' if item["new_line"] and cn_lines and item["new_line"] <= len(cn_lines) else ''
html += f'<div class="line diff-context" {cn_line_attr} title="{cn_title}"><span class="line-number">{item["new_line"]}</span><span class="line-content">{item["content"]}</span></div>'
else:
html += f'<div class="line"><span class="line-content">{item["content"]}</span></div>'
else:
# 新页面或无diff
if en_diff and en_diff.startswith("新创建页面"):
html += '<div class="new-page-notice">新创建页面</div>'
# 显示完整内容新页面或无diff时
for i, line in enumerate(en_new_lines or [], 1):
cn_line_attr = f'data-cn-line="{i}"' if cn_lines and i <= len(cn_lines) else ''
cn_title = f'中文第{i}' if cn_lines and i <= len(cn_lines) else ''
html += f'<div class="line diff-context" {cn_line_attr} title="{cn_title}"><span class="line-number">{i}</span><span class="line-content">{line}</span></div>'
html += '''
</div>
</div>
<div class="separator"></div>
<div class="column">
<div class="column-header">中文翻译</div>
<div class="diff-content" id="cn-content">
'''
# 添加中文内容
if cn_content:
html += '<div id="cn-lines">'
for i, line in enumerate(cn_lines, 1):
html += f'<div class="line diff-context" id="cn-line-{i}"><span class="line-number">{i}</span><span class="line-content">{line}</span></div>'
html += '</div>'
else:
html += '<div class="no-translation">未找到对应的中文翻译页面</div>'
html += '''
</div>
</div>
</div>
<script>
// 同步滚动功能
const enDiff = document.querySelector('#en-diff');
const cnContent = document.querySelector('#cn-content');
const cnLines = {};
// 构建中文行的位置映射
if (document.getElementById('cn-lines')) {{
document.querySelectorAll('#cn-lines .line').forEach(line => {{
const lineNum = line.querySelector('.line-number').textContent;
if (lineNum) {{
cnLines[lineNum] = line.offsetTop;
}}
}});
}}
// 同步滚动
if (enDiff && cnContent) {{
enDiff.addEventListener('scroll', () => {{
cnContent.scrollTop = enDiff.scrollTop;
}});
cnContent.addEventListener('scroll', () => {{
enDiff.scrollTop = cnContent.scrollTop;
}});
}}
// 点击英文行时高亮对应的中文行
document.querySelectorAll('[data-cn-line]').forEach(enLine => {{
enLine.addEventListener('click', () => {{
const cnLineNum = enLine.getAttribute('data-cn-line');
if (cnLineNum) {{
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
if (cnLine) {{
// 移除所有高亮
document.querySelectorAll('.line.highlight').forEach(line => {{
line.classList.remove('highlight');
}});
// 高亮英文行和中文行
enLine.classList.add('highlight');
cnLine.classList.add('highlight');
// 滚动到中文行的位置
cnLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
}}
}}
}});
// 鼠标悬停时显示预览
enLine.addEventListener('mouseenter', () => {{
const cnLineNum = enLine.getAttribute('data-cn-line');
if (cnLineNum) {{
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
if (cnLine) {{
enLine.style.backgroundColor = 'rgba(0, 123, 255, 0.1)';
cnLine.style.backgroundColor = 'rgba(0, 123, 255, 0.1)';
}}
}}
}});
enLine.addEventListener('mouseleave', () => {{
if (!enLine.classList.contains('highlight')) {{
enLine.style.backgroundColor = '';
}}
const cnLineNum = enLine.getAttribute('data-cn-line');
if (cnLineNum) {{
const cnLine = document.getElementById(`cn-line-${cnLineNum}`);
if (cnLine && !cnLine.classList.contains('highlight')) {{
cnLine.style.backgroundColor = '';
}}
}}
}});
}});
</script>
</body>
</html>'''
return html
def save_files(title, diff_html, diff_text, full_text, timestamp, note="", revid=None, cn_content=None, old_full_text=None):
global CURRENT_OUTPUT_DIR global CURRENT_OUTPUT_DIR
# 确保本次执行的输出目录已经创建 # 确保本次执行的输出目录已经创建
@ -154,182 +683,120 @@ def save_files(title, diff_html, full_text, timestamp, note="", revid=None):
safe_title = "".join(c if c.isalnum() or c in " -_." else "_" for c in title) safe_title = "".join(c if c.isalnum() or c in " -_." else "_" for c in title)
time_str = timestamp[:19].replace("-", "").replace(":", "").replace("T", "_") time_str = timestamp[:19].replace("-", "").replace(":", "").replace("T", "_")
# 简化文件名格式只包含标题、时间和revid
base_filename = f"{safe_title}-{time_str}-{revid}" if revid else f"{safe_title}-{time_str}" base_filename = f"{safe_title}-{time_str}-{revid}" if revid else f"{safe_title}-{time_str}"
# 保存各种文件
files_to_save = []
# 1. 标准MediaWiki diff HTML
diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.html" diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.html"
if diff_html:
files_to_save.append((diff_file, diff_html))
# 2. 文本格式的diff
text_diff_file = CURRENT_OUTPUT_DIR / f"{base_filename}.diff.txt"
if diff_text:
files_to_save.append((text_diff_file, diff_text))
# 3. 最新完整内容
full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.full.txt" full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.full.txt"
if full_text:
files_to_save.append((full_file, full_text))
# 美化 HTML diff使用类似git diff的配色方案 # 4. 历史版本内容(如果存在)
# 先处理diff_html将ins/del标签替换为span标签 if old_full_text:
processed_diff_html = diff_html.replace('<ins class="diffchange', '<span class="diffchange added"').replace('</ins>', '</span>').replace('<del class="diffchange', '<span class="diffchange deleted"').replace('</del>', '</span>') old_full_file = CURRENT_OUTPUT_DIR / f"{base_filename}.old.txt"
# 再处理diff标记将data-marker属性替换为实际的span元素 files_to_save.append((old_full_file, old_full_text))
processed_diff_html = processed_diff_html.replace('<td class="diff-marker" data-marker=""></td>', '<td class="diff-marker"><span class="minus-marker"></span></td>').replace('<td class="diff-marker" data-marker="+"></td>', '<td class="diff-marker"><span class="plus-marker">+</span></td>')
html_wrapper = f'''<!DOCTYPE html> # 5. 中文翻译内容(如果存在)
<html><head><meta charset="utf-8"><title>Diff: {title}</title> if cn_content:
<style> cn_file = CURRENT_OUTPUT_DIR / f"{base_filename}.cn.txt"
body {{ files_to_save.append((cn_file, cn_content))
font-family: system-ui, sans-serif;
margin: 20px;
}}
table.diff {{
border-collapse: collapse;
font-family: monospace;
width: 100%;
table-layout: fixed;
}}
table.diff td {{
padding: 0 5px;
vertical-align: top;
white-space: pre-wrap;
word-break: break-all;
font-size: 14px;
line-height: 1.4;
}}
table.diff col.diff-marker {{
width: 20px;
text-align: right;
background-color: #fafafa;
}}
table.diff col.diff-content {{
width: auto;
}}
table.diff col.diff-addedline,
table.diff col.diff-deletedline {{
width: 50%;
}}
.diff-addedline {{
background-color: #dfd;
}}
.diff-addedline .diffchange {{
background-color: #9e9;
color: #000;
}}
.diff-deletedline {{
background-color: #fee8e8;
}}
.diff-deletedline .diffchange {{
background-color: #faa;
color: #000;
}}
.diff-context {{
background-color: #fafafa;
}}
.diff-context td {{
color: #777;
}}
.diff-marker {{
font-weight: bold;
text-align: right;
padding: 0 4px;
}}
.diff-lineno {{
background-color: #f0f0f0;
text-align: right;
padding: 0 4px;
}}
.diff-addedline .diff-marker {{
color: #080;
}}
.diff-deletedline .diff-marker {{
color: #800;
}}
/* 新增的diff标记样式 */ # 6. 双语对比HTML页面
.plus-marker {{ if cn_content:
color: #080; # 为文本diff准备行
font-weight: bold; en_new_lines = full_text.splitlines() if full_text else []
}} en_old_lines = old_full_text.splitlines() if old_full_text else []
.minus-marker {{
color: #800;
font-weight: bold;
}}
/* 确保变更行有明显的视觉区分 */ # 创建双语对比页面
.diff-addedline div, comparison_html = create_diff_html(title, diff_text, en_old_lines, en_new_lines, cn_content)
.diff-deletedline div {{ comparison_file = CURRENT_OUTPUT_DIR / f"{base_filename}.comparison.html"
display: inline-block; files_to_save.append((comparison_file, comparison_html))
width: 100%; print(f" → 已保存: {comparison_file.relative_to(OUTPUT_DIR)} (双语对比页面)")
}}
/* 增加一些额外的视觉提示 */
.diff-addedline {{
border-left: 4px solid #080;
}}
.diff-deletedline {{
border-left: 4px solid #800;
}}
.diff-context {{
border-left: 4px solid #ccc;
}}
/* 替换ins/del标签为span标签的样式 */
.diffchange.added {{
background-color: #9e9;
color: #000;
font-weight: bold;
text-decoration: none;
}}
.diffchange.deleted {{
background-color: #faa;
color: #000;
font-weight: bold;
text-decoration: line-through;
}}
</style></head><body>
<h2>{title}</h2>
<p>修改时间: {timestamp}</p>
{processed_diff_html}
</body></html>'''
# 写入所有文件
for file_path, content in files_to_save:
try: try:
with open(diff_file, "w", encoding="utf-8") as f: with open(file_path, "w", encoding="utf-8") as f:
f.write(html_wrapper) f.write(content)
with open(full_file, "w", encoding="utf-8") as f: print(f" → 已保存: {file_path.relative_to(OUTPUT_DIR)}")
f.write(full_text)
print(f" → 已保存: {diff_file.relative_to(OUTPUT_DIR)}")
print(f" → 已保存: {full_file.relative_to(OUTPUT_DIR)}")
except Exception as e: except Exception as e:
print(f" → 保存文件时出错: {e}") print(f" → 保存文件 {file_path} 时出错: {e}")
print(f" → 完整路径: {diff_file}")
print(f" → 完整路径: {full_file}")
def process_single_page(title, since_time, update_timestamp=False): def process_single_page(title, since_time, update_timestamp=False):
"""只处理单个页面""" """只处理单个页面"""
print(f"正在单独处理页面:{title}") print(f"正在单独处理页面:{title}")
# 获取当前最新 revid # 获取当前最新 revid
params = {
"action": "query",
"prop": "revisions",
"titles": title,
"rvprop": "ids|timestamp",
"rvlimit": 1,
"format": "json"
}
try: try:
r = SESSION.get(WIKI_API_URL, params=params).json() latest_content, latest_ts, latest_revid = get_page_content(WIKI_API_URL_EN, SESSION_EN, title)
page = next(iter(r["query"]["pages"].values())) if latest_content is None:
if "revisions" not in page:
print("页面不存在或被删除") print("页面不存在或被删除")
return None return None
latest_revid = page["revisions"][0]["revid"]
latest_ts = page["revisions"][0]["timestamp"]
# 获取旧 revid # 获取旧 revid
old_revid = get_old_revid(title, since_time) old_revid = get_old_revid(title, since_time)
diff_html, full_text, new_ts = get_official_diff_and_content(title, old_revid, latest_revid) # 初始化变量
if diff_html is not None and full_text is not None: diff_html = None
# 移除旧的note标记使用更简洁的命名方式 diff_text = None
if not old_revid: old_content = None
diff_html = "<p style='color:green;font-weight:bold'>新创建页面(无历史版本)</p>" cn_content = None
save_files(title, diff_html, full_text, new_ts, "", latest_revid)
if old_revid:
# 获取历史版本内容
old_content, old_ts, _ = get_page_content(WIKI_API_URL_EN, SESSION_EN, title, old_revid)
if old_content is not None:
# 生成文本diff
diff_text = generate_text_diff(old_content, latest_content)
print(f" 生成了文本diff ({len(diff_text)} 字符)")
else: else:
print(f" 警告: 未能获取完整的差异或内容数据") print(f" 无法获取历史版本内容")
else:
# 新页面
print(" 这是新创建的页面")
# 搜索对应的中文页面
print(" 搜索中文翻译...")
cn_title = search_chinese_page(title)
if cn_title:
print(f" 找到中文页面: {cn_title}")
cn_content, cn_ts, cn_revid = get_page_content(WIKI_API_URL_CN, SESSION_CN, cn_title)
if cn_content:
print(f" 获取中文内容成功 ({len(cn_content)} 字符)")
else:
print(" 无法获取中文页面内容")
else:
print(" 未找到对应的中文翻译页面")
# 获取官方diff可选
if old_revid:
diff_params = {
"action": "compare",
"fromrev": old_revid,
"torev": latest_revid,
"format": "json"
}
try:
diff_resp = SESSION_EN.get(WIKI_API_URL_EN, params=diff_params).json()
diff_html = diff_resp.get("compare", {}).get("*", "")
except Exception as e:
print(f" 获取官方HTML diff时出错: {e}")
# 保存所有文件
save_files(title, diff_html, diff_text, latest_content, latest_ts, "", latest_revid, cn_content, old_content)
if update_timestamp: if update_timestamp:
save_last_timestamp(latest_ts) save_last_timestamp(latest_ts)
@ -362,7 +829,7 @@ def process_all_pages_since(since_time):
print(f"文件保存在:{CURRENT_OUTPUT_DIR.resolve() if CURRENT_OUTPUT_DIR else OUTPUT_DIR.resolve()}") print(f"文件保存在:{CURRENT_OUTPUT_DIR.resolve() if CURRENT_OUTPUT_DIR else OUTPUT_DIR.resolve()}")
def main(): def main():
parser = argparse.ArgumentParser(description="MediaWiki 同步工具 - 支持全量/单页/自定义时间") parser = argparse.ArgumentParser(description="MediaWiki 同步工具 - 增强版支持双语对比")
parser.add_argument("--since", type=str, help="强制从指定时间开始同步,格式如 2025-11-28T00:00:00Z") parser.add_argument("--since", type=str, help="强制从指定时间开始同步,格式如 2025-11-28T00:00:00Z")
parser.add_argument("--title", type=str, help="只同步指定的单个页面标题") parser.add_argument("--title", type=str, help="只同步指定的单个页面标题")
parser.add_argument("--update-timestamp", action="store_true", parser.add_argument("--update-timestamp", action="store_true",

8
target.txt Normal file
View File

@ -0,0 +1,8 @@
根据READMEsync.py中会获取wiki.projectdiablo2.com的变更并拉下原文的全量文件。现在需要增加以下功能
1. 获取英文wiki的最新页面full(已实现)获取其上个版本的全量full(用上一步中的old_revid拉取).
2. 如果该网页是新增现有逻辑则只保存最新文件full即可。
3. 如果该wiki是变更则用历史版本的full文件和最新的文件进行diff得到diff文件。此处用模仿git diff的Python或库进行。得到diff文件。
4. 对于该页面标题去另一网站wiki.projectdiablo2.cn搜索并拉下原文这是同步的翻译后的中文网站。需要注意的在两个网站的页面ID不会一致但页面title是保持一致的同时绝大部分页面经过了翻译。
5. 保存一个网页生成diff文件的网页展示页面设计美观精致使用现代化的CSS/JS。将页面竖向分成两栏左边为英文源码的两个版本DIFF右侧为同样行号的中文源码。 注意行号是保持一致的。绝大多数页面的中文的行号是完全一致的可以放心对比。diff的展示同样要有标准的红色、绿色等.