Replace Pioneer Home All Examples Free Download

901.Text file parser -- How to extract distinct webpages from weblog file?

User: editor -- 2012-01-05 << 900 902 >>

Hits: 4147

Type: Text file parser

Description:

How to extract distinct webpages from weblog file? In other word, how to extract the 7th column of words, and remove duplicate ones.

Input Sample:

42.138.85.44 - - [05/Dec/2011:09:33:07 -0800] "GET /services/file1.html HTTP/1.0" 200 ...
42.138.85.44 - - [05/Dec/2011:09:33:09 -0800] "GET /services/file2.html HTTP/1.0" 200 ...
42.138.85.44 - - [05/Dec/2011:09:33:11 -0800] "GET /services/file3.html HTTP/1.0" 200 ...
33.56.122.46 - - [05/Dec/2011:09:33:12 -0800] "GET /services/file2.html HTTP/1.0" 200 ...
33.56.122.46 - - [05/Dec/2011:09:33:18 -0800] "GET /services/file3.html HTTP/1.0" 200 ...

Output Sample:

/services/file1.html
/services/file2.html
/services/file3.html

Answer:

Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps.

1. ctrl-o open weblog file
2. ctrl-h open 'replace' dialog
* set 'replace unit' to 'Line'
* set 'replace with pattern' to:

click 'advanced' page
* set 'run following at the beginning of replace' to:

* set 'run following for each matched unit' to:

3. click 'replace', done.

Screenshot 1: Replace_Window

Screenshot 2: Replace_Advanced_Window

Similar Examples:

How to extract all distinct parent folder names from a list of files? (65%)
How to extract multiple webpages and save all text into text files? (64%)
How to extract all the links and images from many web page files? (64%)
How to extract text from many webpage files and form a dabase file? (63%)
How to extract email from webpage and remove duplicate? (62%)
How to extract last 100 lines from multiple files? (61%)
How to extract all image links from a html file? (61%)
How to extract all links of a webpage periodically and save to file? (60%)

Check Demo of Text file parser

Keywords:

blog 7th column webpages toc pages remove duplicate remove duplicat remove dupl remove dup webpage distinct replace duplicate column remove duplicate words extract words to column duplicate words extract column remove words duplicate remove