hrefer engines.ini
The SEO Bay Xrumer Forum and SEO Community.
+ Reply to Thread
Results 1 to 6 of 6
Like Tree1Likes
  • 1 Post By beerseo

Thread: hrefer engines.ini

  1. #1
    Deckhand beerseo is on a distinguished road
    Reputation:
    12

    Join Date
    25th Sep 2011
    Posts
    37
    Thanks
    10
    Thanked 12 Times in 4 Posts

    Cool hrefer engines.ini



    Yoooo...

    So I've been playing around with hrefer's engines.ini file to see how it works.

    I know some users offer their engines.ini file for purchase, so I apologize if this is cutting into your profits, but I'm a big fan of free information so here goes...

    I'm trying to add in new search engines, I think I understand the syntax but I can't seem to get hrefer to scrape results for some of the engines.

    Here's what I have so far:

    Ask.com's entry (grabs the next page URL, but doesn't scrape results):
    [Ask]
    Hostname=http://www.ask.com
    Query=web?qsrc=1&o=0&l=dir&q=[QUERY]
    LinksMask=<div id="[...]"><a[...]target="_blank"[...]href="[LINK]">
    TotalPages=50
    NextPage=<div style="padding:1px 0 0 8px;" class="pgnav fl"><a class="txt3 title l_nu" href="[LINK]">Next&#160;&#187;
    NextPage2=<div style="padding:1px 0 0 8px;" class="pgnav fl"><a class="txt3 title l_nu" href="[LINK]">Next&#160;&#187;
    CaptchaURL=
    CaptchaImage=
    CaptchaField=

    Info.com's entry (works perfectly):
    [Info.com]
    Hostname=http://www.info.com
    Query=searchw?qkw=[QUERY]&qcat=web&q=[QUERY]&qhqn=[QUERY]&KW=[QUERY]
    LinksMask=<div class="t"><a href="[...]" target="_blank" title="[LINK]">
    TotalPages=50
    NextPage=&nbsp; <a href="[LINK]" title="Next Page" onMouseOver='window.status="Next Page";return true' onMouseOut='window.status="";return true'>Next &gt;
    NextPage2=&nbsp; <a href="[LINK]" title="Next Page" onMouseOver='window.status="Next Page";return true' onMouseOut='window.status="";return true'>Next &gt;
    CaptchaURL=
    CaptchaImage=
    CaptchaField=

    Hotbot.com's entry (gives me some script error and then bails):
    [Hotbot.com]
    Hostname=http://hotbot.com
    Query=search/web?q=[QUERY]
    LinksMask=<h3 class="web-url resultTitle"><a title="[...]" href="[LINK]">
    TotalPages=50
    NextPage=<a class="next" href="[LINK]">Next</a> </div>
    NextPage2=<a class="next" href="[LINK]">Next</a> </div>
    CaptchaURL=
    CaptchaImage=
    CaptchaField=

    Blekko.com's entry (scrapes 1st page of results, but doesn't find next button):
    [Blekko]
    Hostname=http://blekko.com
    Query=ws/[QUERY]
    LinksMask=<h2 class="title"><a class="UrlTitleLine ui-widget-content "[...]href="[LINK]">
    TotalPages=50
    NextPage=<span class="active">[...]</span><a class="page" href="[LINK]">
    NextPage2=</a><a href="[LINK]">Next &raquo;</a></div><div class="clear">&nbsp;</div>
    CaptchaURL=
    CaptchaImage=
    CaptchaField=

    Also, does anyone have an updated list of Google's datacenter IPs
    ? Would very much appreciate it

    Like I said, I apologize if this is cutting into anyone's profits but I think we should be sharing...

    -beerSEO

    Similar Threads:

  2. The Following 2 Users Say Thank You to beerseo For This Useful Post:

    more coffee (29-02-2012), Smoke (29-02-2012)

  3. #2
    Ancient One more coffee is just really nice more coffee is just really nice more coffee is just really nice more coffee is just really nice
    Reputation:
    363

    Join Date
    18th Mar 2011
    Location
    Scotland
    Posts
    1,187
    Thanks
    106
    Thanked 363 Times in 150 Posts
    great share dude , shame only one engines works properly .

    and wow your info code is crazy complex you can use the [...] command that helps shorten the code alot . There are many different ways to code these mods . Perfecting them is where the real work comes in hopefully other members of the forum will be happy to help fix the broken engines.

    Nice to see your code is unlike my own well done

    Here is a better way for info

    [Info.com]
    Hostname=http://info.com/
    Query=searchw?qkw=[QUERY]&qcat=web&q=[QUERY]&qhqn=[QUERY]&KW=[QUERY]
    LinksMask=<div class="t"><a href="[...]" target="_blank" title="[LINK]"
    TotalPages=50
    NextPage=<a href="[LINK]=[QUERY]" title="Next Page"


    Enjoy , im out

    Last edited by more coffee; 29-02-2012 at 00:11.

  4. The Following 3 Users Say Thank You to more coffee For This Useful Post:

    beerseo (29-02-2012), Smoke (29-02-2012), thebond (29-04-2012)

  5. #3
    Deckhand beerseo is on a distinguished road
    Reputation:
    12

    Join Date
    25th Sep 2011
    Posts
    37
    Thanks
    10
    Thanked 12 Times in 4 Posts
    @more coffee...thanks man! I know I can consolidate the code with [...] but every time I did that it wasn't finding what I wanted it to. That being said, your edits work just as perfectly; thanks!

    ...my bad, I know you have your own [Only registered and activated users can see links. ] for $10 (good deal btw) you're pushing. I'm just cheap

    egbertc2 likes this.

  6. #4
    Deckhand beerseo is on a distinguished road
    Reputation:
    12

    Join Date
    25th Sep 2011
    Posts
    37
    Thanks
    10
    Thanked 12 Times in 4 Posts
    Query for the experts....

    How do I trim the end of a URL in hrefer (and/or xrumer)? As in, if I scrape a URL from a search results page that has extra garbage appended to it, how would I remove the garbage from a certain point?

    For example, I get a URL like this:

    coachdriverforum.co.uk/welcome/search.php?do=getdaily&du=www.coachdriverforum.co. uk/%e2%80%a6getdaily&ld=20120229&ap=1&app=1&c=info.me tac.t1.2&s=metacrawler&coi=239138&cop=main-title&euip=127.0.0.1&npp=1&p=0&pp=0&pvaid=071039f2 fb344d9f9e4312479c69e86c&sid=1902547237.3791576486 11.1330544027&vid=1902547237.379157648611.13305440 27.1&fcoi=417&fcop=topnav&fpid=27&ep=1&mid=9&hash= FA1893B6A5BDF5A77F94E64E8A7A08A8

    And I want to get rid of everything from "&du=" to the end of the URL...is that even possible? It must be... Am I going brain dead?

    Thanks in advance!

    -beerSEO

    Last edited by beerseo; 29-02-2012 at 22:24. Reason: edited out live URL

  7. #5
    Big Boss kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all
    Reputation:
    635

    Join Date
    8th Dec 2010
    Posts
    2,221
    Thanks
    378
    Thanked 638 Times in 309 Posts
    @beerseo: It has already been discussed on the forum, use Xrumer tools. Also here is a guide by llookk that should help you:
    [Only registered and activated users can see links. ]

    p.s.
    Please start threads as this in the right sections. Hrefer Tutorials is not the right section for your current thread. I will move it to the Hrefer Files, as this is more of a working engine.ini thread were people can add their own engines should they wish


  8. #6
    Recruit abuhle is an unknown quantity at this point
    Reputation:
    3

    Join Date
    26th Oct 2011
    Location
    Bat Country
    Posts
    6
    Thanks
    0
    Thanked 3 Times in 2 Posts
    Give this a try:

    Code:
    [Blekko.com]
    Hostname=http://blekko.com/ws/
    Query=[QUERY]
    LinksMask=<class="UrlTitleLine ui-widget-content " href="[LINK]"
    TotalPages=100
    NextPage
    =<a href="[LINK]">Next &raquo;</a>
    NextPage2=<a href="[LINK]">Next &raquo;</a
    Don't know why but after scraping around 5k links (50k filtered duplicated+sieve) it blocks me *LOL*


  9. The Following 2 Users Say Thank You to abuhle For This Useful Post:

    biturbo (15-07-2015), thebond (29-04-2012)


 

Visitors found this page by searching for:

hrefer engines.ini

hrefer engine.ini

engines.ini hrefer

engines.ini

hrefer engines

engines.ini seobay

captchafield captchaimagehrefer engines editdownload free engine.ini inurl:engines.inidownload engine.ini filehrefer engines.ini visual editorengine.ini missingengines.ini file for hreferhrefer enginer.inihrefer rambler engines.ini 2013hrefer engine files freeengine.ini file free downloadhrefer editing engines.iniinurl:hreferengines.iniget xrumer engines.inihrefer searchengine iniengines.ini [link] [url][get] hrefer engine.inihrefer blekko

Tags for this Thread