Replace Pioneer Home   All Examples   Free Download

 New request --free  RSS: Replace Pioneer Examples

901.Text file parser -- How to extract distinct webpages from weblog file?

User: editor -- 2012-01-05          << 900  902 >>
Hits: 3036
Type: Text file parser   
Search all Text file parser examples
How to extract distinct webpages from weblog file? In other word, how to extract the 7th column of words, and remove duplicate ones.
Input Sample: - - [05/Dec/2011:09:33:07 -0800] "GET /services/file1.html HTTP/1.0" 200 ... - - [05/Dec/2011:09:33:09 -0800] "GET /services/file2.html HTTP/1.0" 200 ... - - [05/Dec/2011:09:33:11 -0800] "GET /services/file3.html HTTP/1.0" 200 ... - - [05/Dec/2011:09:33:12 -0800] "GET /services/file2.html HTTP/1.0" 200 ... - - [05/Dec/2011:09:33:18 -0800] "GET /services/file3.html HTTP/1.0" 200 ...
Output Sample:
Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps.
1. ctrl-o open weblog file 
2. ctrl-h open 'replace' dialog 
* set 'replace unit' to 'Line' 
* set 'replace with pattern' to: 
click 'advanced' page 
* set 'run following at the beginning of replace' to: 
* set 'run following for each matched unit' to: 
3. click 'replace', done.

Screenshot 1:  Replace_Window

Screenshot 2:  Replace_Advanced_Window

Similar Examples:
How to extract all distinct parent folder names from a list of files? (65%)
How to extract multiple webpages and save all text into text files? (64%)
How to extract all the links and images from many web page files? (64%)
How to extract text from many webpage files and form a dabase file? (63%)
How to extract email from webpage and remove duplicate? (62%)
How to extract last 100 lines from multiple files? (61%)
How to extract all image links from a html file? (61%)
How to extract all links of a webpage periodically and save to file? (60%)

Check Demo of Text file parser
blog  7th column  webpages  toc  pages  remove duplicate  remove duplicat  remove dupl  remove dup  webpage  distinct replace  duplicate column  remove duplicate words  extract words to column  duplicate words  extract column  remove words  duplicate remove