2011年11月9日

Crawler


For the reason of research, I code a web crawler to gather the information of a fishing community site called fishing-tank in the recent days. The small crawler is codded with Perl. I figure that Perl is a convenient programming language even if the source code in Perl is hart to read. Today, I am surprised that Perl supports the inline language; the C code is able to be embedded in the Perl code.

While I used my web crawler to gather the web sits, I found an interesting thing of that wget, lynx and LWP can't interpret the javascript, therefore, the web crawler can't get the page which is written in encrypted javascript only. Some javascript codes is encrypted with escape.

Now, I feel tired because I only slept a little last night. But I can't go to bed now for the nice sleeping tonight.