User: jame -- 2017-04-14 << 1391 1393 >> |
Hits: 3422 |
Type: Text file parser |
Search all Text file parser examples |
Description: |
How to extract tables from many html files into one csv file? I have some html files downloaded from website, that has some tables inside it, all tables look like </thead> <tbody>xxxxxxxxxx</tbody> |
Input Sample: |
html file 1: ... </thead> <tbody><tr><td><a href="fjzt-x.html?uid=xxxxxx">data11</a></td> <td class="bzt">data12</td> <td>data13</td> <td>data14</td> <td>data15</td> <td>data16</td> <td>data17</td> <td class="tdb"><span id="sxxxxxxx"></span></td> <td class="tdb"><span id="zfxxxxxxx"></span></td> <td class="bzt">--</td><td></td> </tr> <script src="https://hq.sinajs.cn/list=data18" type="text/javascript" charset="gbk"></script> <script type="text/javascript">getprice1('xxxxxxxx',xxxxxxx,x |
Output Sample: |
data11,data12,data13,data14,data15,data16,data17,data18 data21,data22,data23,data24,data25,data26,data27,data28 data31,data32,data33,data34,data35,data36,data37,data38 data41,data42,data43,data44,data45,data46,data47,data48 |
Answer: |
Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps. |
1. open "Tools->Batch Runner" menu 2. drag multiple files from windows file browser to "batch runner" window 3. click "add" to add new rules * set "search" to: * set "replace" to: * click "ok" 4. make sure following options are checked: "Reg exp", "Cross line" and "Extract" 5. click "start" 6. click "output to single file" and select output file, done. Note: 1. The output is very close to your required format, you can do another replacement easily. 2. if you just need to extract a standard table, set 'search' to: set 'replace' to: |
Screenshot 1: Fast_Replace_Window |