I did it quickly yesterday before going to sleep
but it was working with my tests... maybe when you save the txt file, it will make a wrong structure....
hjackson, the spider was looking for this cover:
ie:
- Code: Select all
<a href="javascript:largeIM('http://ak.buy.com/db_assets/large_images/303/40722303.jpg');hide
this is the line to find, so in the spider we ca write:
- Code: Select all
largeIM_hide\('(?<coverart>.*?)'\)">
or
- Code: Select all
'(?<coverart>http://ak.buy.com/db_assets/large_images/.*?)'
etc...
so the full spider will be:
- Code: Select all
url=http://www.buy.com/retail/searchresults.asp?search_store=4&querytype=video%5Fdvd&qu=%searchstring%
results=<a tabindex="." href="(?<url>.*?)" class="medBlueText"><b>(?<display>.*?)</b></a>
//find coverart <url> tag will open that url and use next regex
largeIM_hide\('(?<coverart>.*?)'\)">
or
- Code: Select all
url=http://www.buy.com/retail/searchresults.asp?search_store=4&querytype=video%5Fdvd&qu=%searchstring%
results=<a tabindex="." href="(?<url>.*?)" class="medBlueText"><b>(?<display>.*?)</b></a>
//find coverart <url> tag will open that url and use next regex
'(?<coverart>http://ak.buy.com/db_assets/large_images/.*?)'
later,