UPDATE: NEW BUY.COM SPIDERS (website change)

by **jlr2000** on Sat Nov 12, 2005 11:48 pm

I have used the spiders and have also just logged on to Buy.com to grab their images, which are some of the best quality IMO. The website has changed and now if you click on the smaller image a larger is presented however you cannot right click and "save as".

You can get around this by doing a view source and grabbing the large image url and opening it separately, but this is a bit more complicated. Is it possible to change the spider to do this automagically?

by **dgemily** on Sun Nov 13, 2005 1:32 am

http://www.france.xlobby.com/spiders/dv ... uy.com.txt

let me know if it's works correctly

by **jlr2000** on Sun Nov 13, 2005 1:54 am

I tried it but it did notwork. No covers were listed or displayed. Thanks for the attempt!

by **hjackson** on Sun Nov 13, 2005 7:56 am

Dgemily, the spider would have to change one word in the coverart URL to get the large image. The small image URL has the word "prod" in it (ie. http://ak.buy.com/db_assets/prod_images ... 725040.jpg) . Changing "prod" to "large" will bring up the large image (ie. http://ak.buy.com/db_assets/large_image ... 725040.jpg)
I hope this helps.

hjackson

PS. Actually, saving the large image is not complicated at all if you use this shortcut: When right clicking over the small image, click "View Image". That will bring up the URL of the small image, then change "prod" to "large" as I mentioned above.

by **dgemily** on Sun Nov 13, 2005 4:21 pm

I did it quickly yesterday before going to sleep

but it was working with my tests... maybe when you save the txt file, it will make a wrong structure....
hjackson, the spider was looking for this cover:
ie:

Code: Select all: <a href="javascript:largeIM('http://ak.buy.com/db_assets/large_images/303/40722303.jpg');hide

this is the line to find, so in the spider we ca write:

Code: Select all: largeIM_hide\('(?<coverart>.*?)'\)">

or

Code: Select all: '(?<coverart>http://ak.buy.com/db_assets/large_images/.*?)'

etc...

so the full spider will be:

Code: Select all: url=http://www.buy.com/retail/searchresults.asp?search_store=4&querytype=video%5Fdvd&qu=%searchstring% results=<a tabindex="." href="(?<url>.*?)" class="medBlueText">(?<display>.*?)</a> //find coverart <url> tag will open that url and use next regex largeIM_hide\('(?<coverart>.*?)'\)">

or

Code: Select all: url=http://www.buy.com/retail/searchresults.asp?search_store=4&querytype=video%5Fdvd&qu=%searchstring% results=<a tabindex="." href="(?<url>.*?)" class="medBlueText">(?<display>.*?)</a> //find coverart <url> tag will open that url and use next regex '(?<coverart>http://ak.buy.com/db_assets/large_images/.*?)'

later,

by **jlr2000** on Sun Nov 13, 2005 6:42 pm

Fantastic! Works great......

is it simple enough to change the Music spider for Buy.com as well?

Thanks!

by **dgemily** on Mon Nov 14, 2005 12:06 am

for: music - buy.com.txt

Code: Select all: url=http://www.buy.com/retail/searchresults.asp?search_store=6&querytype=music&qu=%searchstring% results=<a tabindex="." href="(?<url>/prod/.*?)" class="medBlueText">(?<display>.*?)</a> //find coverart <url> tag will open that url and use next regex largeIM_hide\('(?<coverart>.*?)'\)">

or

Code: Select all: url=http://www.buy.com/retail/searchresults.asp?search_store=6&querytype=music&qu=%searchstring% results=<a tabindex="." href="(?<url>/prod/.*?)" class="medBlueText">(?<display>.*?)</a> //find coverart <url> tag will open that url and use next regex '(?<coverart>http://ak.buy.com/db_assets/large_images/.*?)'

for both ( dvd and music), these spiders will find a cover only if a large cover exist.... (and nothing even if a small one exist...)

by **jlr2000** on Mon Nov 14, 2005 2:48 am

Great! Thanks so much for the those.....

I've changed the subject for anyone else searching for these....

Thanks dgmeily!

by **dgemily** on Mon Nov 14, 2005 8:15 am

I built a topic about spiders : http://www.xlobby.com/forum/viewtopic.php?t=3640

I will upload those spiders then update the post....