Google Searching

Due to a post on the geeks list i modified some of my other scripts and came up with this.

The request was for a program that would continually make web requests in order to foil an ISP's (in this case the nefarious comcast) web request tracking. My script does a google search on the supplied string and does GET requests for each page returned.

Example:

$ ./google-query francisco+roque
Parsing links found in http://www.google.com/search?num=100&q=francisco+roque
getting: http://www.vrml-art.org/node.php3?nid=41&lid=1
getting: http://www.vrml-art.org/page.php3?pid=66&lid=1
getting: http://www.wfp.org/newsroom/in_depth/central_america.html
getting: http://www.wfp.org/newsroom/in_depth/central_america_21_09_01.html
getting: http://www.halsema.org/Genealogy/Database/D0002/I617.html
getting: http://www.halsema.org/Genealogy/Database/IND0012.html
getting: http://www.reliefweb.int/w/rwb.nsf/6686f45896f15dbc852567ae00530132/
b16db067ecbed07c1256b3600347101?OpenDocument
getting: http://www.reliefweb.int/w/rwb.nsf/6686f45896f15dbc852567ae00530132/
e43766262fec79c85256a9b0067fa47?OpenDocument
getting: http://www.blackant.net/work/
getting: http://www.blackant.net/other/old/index-split/notes/
getting: http://www-personal.wccnet.org/~frisco/code/vrml.html
getting: http://www.fao.org/docrep/x5630e/x5630e01.htm
getting: http://www.fao.org/reliefoperations/spanish/nicaragua.htm
getting: http://www.cstbkk.org/ARHPartII.htm
and so on.

If you put this in a while loop and place it in the background then you'll be continually making GET requests and any tracking done by your ISP will have a bunch of garbage to sort though.

Note that this isn't very bandwidth friendly.

My script doesn't do any error checking and hasn't been thoroughly tested; use at your own risk. From my understanding of google's Terms of Service, i don't think personal use of this script violates those terms, but im no lawyer so use at your own peril.

the script, in case you missed the link above.