Selaa lähdekoodia

追加文件时不需要表头

pull/108/head
NaiboWang-Alienware 1 vuosi sitten
vanhempi
commit
4f3c030d27
4 muutettua tiedostoa jossa 13 lisäystä ja 11 poistoa
  1. +3
    -3
      ElectronJS/README.md
  2. +4
    -4
      ExecuteStage/Readme.md
  3. +4
    -2
      ExecuteStage/easyspider_executestage.py
  4. +2
    -2
      Extension/README.md

+ 3
- 3
ElectronJS/README.md Näytä tiedosto

@ -34,11 +34,11 @@ Taking the example of Windows x64 version.
实在搞不定本节的情况下,下载一个直接能用的EasySpider,并把文件夹内的`EasySpider\resources\app\chrome_win64`文件夹拷贝到此`ElectronJS`文件夹下即可。
If you're unable to handle the tasks in this section, you can download a ready-to-use EasySpider. Simply copy the EasySpider\resources\app\chrome_win64 folder from the downloaded files and paste it into the ElectronJS folder.
If you're unable to handle the tasks in this section, you can download a ready-to-use EasySpider. Simply copy the `EasySpider\resources\app\chrome_win64` folder from the downloaded files and paste it into the ElectronJS folder.
------
在自己的机器环境已经安装了Chrome的情况下,直接执行`python3 update_chrome.py`也可以完成本节下面写的一系列的操作,注意设置文件中的chrome大版本号为本机chrome的版本号。
在自己的机器环境已经安装了Chrome的情况下,直接执行`python3 update_chrome.py`也可以完成本节下面写的一系列的操作,注意设置文件中的Chrome大版本号为本机Chrome的版本号。
If you already have Chrome installed on your local machine, you can directly execute python3 update_chrome.py to perform the operations mentioned in the following section. Make sure to set the Chrome major version in the configuration file to match the version of Chrome installed on your machine.
@ -85,7 +85,7 @@ chromedriver_linux64 # for linux x64
chromedriver_mac64 # for mac x64
```
For example, if you want to build this software on Windows x64 platform, then you should first download a chrome for Windows x64, then copy the whole `chrome` folder to this `ElectronJS` folder and rename the folder to `chrome_win64`, assume the chrome version you downloaded is 110; then, download a `chromedriver.exe` with version 110 for Windows x64, and put it into the `chrome_win64` folder, then rename it to `chromedriver_win64.exe`.
For example, if you want to build this software on Windows x64 platform, then you should first download a Chrome for Windows x64, then copy the whole `chrome` folder to this `ElectronJS` folder and rename the folder to `chrome_win64`, assume the Chrome version you downloaded is 110; then, download a `chromedriver.exe` with version 110 for Windows x64, and put it into the `chrome_win64` folder, then rename it to `chromedriver_win64.exe`.
Finally, copy the `stealth.min.js` and `execute.bat` (for Windows x64) file in this folder to these `chrome` folders.

+ 4
- 4
ExecuteStage/Readme.md Näytä tiedosto

@ -25,7 +25,7 @@ This section covers the compilation instructions for the `Execution stage progra
3. 安装执行阶段需要的依赖库:
```sh
pip3 install -r requirements.txt
pip3 install -r requirements.txt
```
-----
@ -35,12 +35,12 @@ This section covers the compilation instructions for the `Execution stage progra
3. Install the required dependencies for the execution stage by running:
```sh
pip3 install -r requirements.txt
pip3 install -r requirements.txt
```
## 运行说明/Run Instruction
运行程序前,确保已经完成了`ElectronJS`文件夹下的编译说明,保证`chrome`文件夹和`chromedriver`环境已经就绪,同时**EasySpider**主程序已在运行中。
运行程序前,确保已经完成了`ElectronJS`文件夹下`主程序`的编译,保证`chrome`文件夹和`chromedriver`环境已经就绪,同时**EasySpider主程序已在运行中**
在当前文件夹下直接运行程序:
@ -52,7 +52,7 @@ python3 easyspider_executestage.py --id [1]
-----
Before running the program, make sure you have completed the compilation instructions in the `ElectronJS` folder and ensure that the `chrome` folder and `chromedriver` environment are ready. Also, ensure that the **EasySpider** main program is already running.
Before running the program, make sure you have completed the compilation of the `main program` in the `ElectronJS` folder and ensure that the `chrome` folder and `chromedriver` environment are ready. Also, ensure that the **EasySpider main program is already running**.
To run the program directly in the current folder, use the following command:

+ 4
- 2
ExecuteStage/easyspider_executestage.py Näytä tiedosto

@ -162,7 +162,6 @@ class BrowserThread(Thread):
self.links = list(
filter(isnull, service["links"].split("\n"))) # 要执行的link的列表
self.OUTPUT = [] # 采集的数据
self.OUTPUT.append([]) # 添加表头
self.containJudge = service["containJudge"] # 是否含有判断语句
self.bodyText = "" # 记录bodyText
tOut = service["outputParameters"] # 生成输出参数对象
@ -171,11 +170,14 @@ class BrowserThread(Thread):
self.log = "" # 记下现在总共开了多少个标签页
self.history = {"index": 0, "handle": None} # 记录页面现在所以在的历史记录的位置
self.SAVED = False # 记录是否已经存储了
if not os.path.exists("Data/" + str(self.id) + "/" + self.saveName + '.csv'): # 文件叠加的时候不添加表头
self.OUTPUT.append([]) # 添加表头
for para in tOut:
if para["name"] not in self.outputParameters.keys():
self.outputParameters[para["name"]] = ""
self.dataNotFoundKeys[para["name"]] = False
self.OUTPUT[0].append(para["name"])
if not os.path.exists("Data/" + str(self.id) + "/" + self.saveName + '.csv'):
self.OUTPUT[0].append(para["name"])
self.urlId = 0 # 全局记录变量
self.preprocess() # 预处理,优化提取数据流程

+ 2
- 2
Extension/README.md Näytä tiedosto

@ -6,7 +6,7 @@ EasySpider分三部分:
2. 浏览器扩展:在`Extension`文件夹下,为浏览器的“操作控制台”的代码,打包后的扩展在`ElectronJS`目录下的`EasySpider_zh.crx`文件。
3. 执行阶段程序:在`ExecuteStage`文件夹下。
此部分为`浏览器扩展`的编译说明,本节的所有命令都在`manifest_v3`文件夹内执行。
此部分为`浏览器扩展`的编译说明,**本节的所有命令都在`manifest_v3`文件夹内执行**
-----
@ -16,7 +16,7 @@ EasySpider is divided into three parts:
2. Browser extension: Located in the Extension folder, i.e., the `EasySpider_en.crx` file in the `ElectronJS` folder.
3. Execution stage program: Located in the ExecuteStage folder.
This section covers the compilation instructions for the `Browser extension`, all commands in this section are executed in the `manifest_v3` folder.
This section covers the compilation instructions for the `Browser extension`, **all commands in this section are executed in the `manifest_v3` folder**.
## 环境构建/Environment Setup

Ladataan…
Peruuta
Tallenna