| User: editor -- 2012-01-05 << 900 902 >> |
| Hits: 64 |
| Type: Text file parser |
| Search all Text file parser examples |
| Description: |
| How to extract distinct webpages from weblog file? In other word, how to extract the 7th column of words, and remove duplicate ones. |
| Input Sample: |
| 42.138.85.44 - - [05/Dec/2011:09:33:07 -0800] "GET /services/file1.html HTTP/1.0" 200 ... 42.138.85.44 - - [05/Dec/2011:09:33:09 -0800] "GET /services/file2.html HTTP/1.0" 200 ... 42.138.85.44 - - [05/Dec/2011:09:33:11 -0800] "GET /services/file3.html HTTP/1.0" 200 ... 33.56.122.46 - - [05/Dec/2011:09:33:12 -0800] "GET /services/file2.html HTTP/1.0" 200 ... 33.56.122.46 - - [05/Dec/2011:09:33:18 -0800] "GET /services/file3.html HTTP/1.0" 200 ... |
| Output Sample: |
| /services/file1.html /services/file2.html /services/file3.html |
| Answer: |
| Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps. |
| 1. ctrl-o open weblog file 2. ctrl-h open 'replace' dialog * set 'replace unit' to 'Line' * set 'replace with pattern' to: click 'advanced' page * set 'run following at the beginning of replace' to: * set 'run following for each matched unit' to: 3. click 'replace', done. |
Screenshot 1: Replace_Window |
Screenshot 2: Replace_Advanced_Window |