Replace Pioneer Home   All Examples   Free Download

 New request --free  RSS: Replace Pioneer Examples

901.Text file parser -- How to extract distinct webpages from weblog file?

User: editor -- 2012-01-05          << 900  902 >>
Hits: 64
Type: Text file parser   
Search all Text file parser examples
Description:
How to extract distinct webpages from weblog file? In other word, how to extract the 7th column of words, and remove duplicate ones.
Input Sample:
42.138.85.44 - - [05/Dec/2011:09:33:07 -0800] "GET /services/file1.html HTTP/1.0" 200 ...
42.138.85.44 - - [05/Dec/2011:09:33:09 -0800] "GET /services/file2.html HTTP/1.0" 200 ...
42.138.85.44 - - [05/Dec/2011:09:33:11 -0800] "GET /services/file3.html HTTP/1.0" 200 ...
33.56.122.46 - - [05/Dec/2011:09:33:12 -0800] "GET /services/file2.html HTTP/1.0" 200 ...
33.56.122.46 - - [05/Dec/2011:09:33:18 -0800] "GET /services/file3.html HTTP/1.0" 200 ...
Output Sample:
/services/file1.html
/services/file2.html
/services/file3.html
Answer:
Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps.
1. ctrl-o open weblog file
2. ctrl-h open 'replace' dialog
* set 'replace unit' to 'Line'
* set 'replace with pattern' to:

click 'advanced' page
* set 'run following at the beginning of replace' to:

* set 'run following for each matched unit' to:

3. click 'replace', done.

Screenshot 1:  Replace_Window


Screenshot 2:  Replace_Advanced_Window


Similar Examples:
How to extract all distinct parent folder names from a list of files? (65%)
How to extract multiple webpages and save all text into text files? (64%)
How to extract all the links and images from many web page files? (64%)
How to extract text from many webpage files and form a dabase file? (63%)
How to extract email from webpage and remove duplicate? (62%)
How to extract all image links from a html file? (61%)
How to extract first line from multiple files and generate a new file? (57%)
How to exract specified strings started with www from text file? (57%)

Check Demo of Text file parser
Keywords:
7th column  blog  webpages  remove duplicat  remove duplicate  toc  webpage  webp  else  duplicat  duplicate words  extract words to column  extract column  remove words  extract words  remove column  column remove  remove word