Good Forum Footprints Mini List for Hrefer
The SEO Bay Xrumer Forum and SEO Community.
+ Reply to Thread
Page 3 of 4 FirstFirst 1 2 3 4 LastLast
Results 21 to 30 of 39
Like Tree1Likes

Thread: Good Forum Footprints Mini List for Hrefer

  1. #21
    Deckhand tmssj2000 is an unknown quantity at this point
    Join Date
    28th Apr 2011
    Posts
    43
    Thanks
    4
    Thanked 1 Time in 1 Post


    Quote Originally Posted by kensai View Post
    @tmssj2000: That is one of many strategies you can use to scrape google for linklists and forums with hrefer and xrumer.
    The point is to use xrumer, analyse success, get url footprints, put them in sieve filter, put keywords in additive words and scrape.
    rinse and repeat. But the next time keep your sieve filters in place, as having none in there will produce massive lists with a lot of no good urls. So even for starters you might want to have a basic sieve filter. And by the way, always clean your lists before running in xrumer, both via xblack and via dupes removal, Especially if you dont use sieve filters...
    Well, let's say my sieve filter just have this line "index.php?showtopic". Doesn't that mean that I am filtering away a lot of potential forums?
    For example, I have scraped this URL from hrefer -> hxxp://ee.twitchfilm.net/site/forums/member/50816/. This is a valid expressionengine forum and it is good for use in xrumer. But using my one-liner sieve filter, I am effectively removing this GOOD forum from my linkslist. Does this make sense? kensai, will like to hear your opinion again...


  2. #22
    Big Boss jubei has a spectacular aura about jubei has a spectacular aura about jubei has a spectacular aura about
    Reputation:
    229

    Join Date
    8th Dec 2010
    Posts
    652
    Thanks
    69
    Thanked 228 Times in 127 Posts
    If your hrefer sieve filter has only this line: "index.php?showtopic" than any results that DO NOT include this line in their url will be Dropped

    That is why we should always try and have as many working footprins in the sieve filter of hrefer as possible, including profile footprints, topic show footprint that we know that work etc.

    So for example, if you had "index.php?showtopic" and "/forums/member/" in your sieve filter, because of the second rule the forum you mentioned above would be IN your final hrefer lists.

    To sum things up: As long as one of the sieve filter rules matches your result from hrefer scrapping, that result will be kept in the final linkslist.

    Last edited by kensai; 08-06-2011 at 16:27.

  3. #23
    Deckhand tmssj2000 is an unknown quantity at this point
    Join Date
    28th Apr 2011
    Posts
    43
    Thanks
    4
    Thanked 1 Time in 1 Post
    Quote Originally Posted by jubei View Post
    If your hrefer sieve filter has only this line: "index.php?showtopic" than any results that DO NOT include this line in their url will be Dropped

    That is why we should always try and have as many working footprins in the sieve filter of hrefer as possible, including profile footprints, topic show footprint that we know that work etc.

    So for example, if you had "index.php?showtopic" and "/forums/member/" in your sieve filter, because of the second rule the forum you mentioned above would be IN your final hrefer lists.

    To sum things up: As long as one of the sieve filter rules matches your result from hrefer scrapping, that result will be kept in the final linkslist.
    yes jubei, I totally agree with you. That's why kensai's explanation was quite weird as he proposed having some kind of basic sieve filter which to my opinion is filtering away a lot of potential forum targets. I would rather scrape all the URLs which could be quite slow at first...then post to them..then use the successful ones to do a detailed URL analysis to get the footprints to put inside the sieve filter. That will then make my sieve filter very comprehensive!


  4. #24
    Big Boss kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all
    Reputation:
    635

    Join Date
    8th Dec 2010
    Posts
    2,221
    Thanks
    378
    Thanked 638 Times in 309 Posts
    What kensai ment is that you should not have an empty sieve filter as that would produce a lot of false results in final list, but rather have some basic form of good footprints, if not the most, then at least the most used ones. That way you dont have to waste a lot of time scraping crap links and another day trying to post on them.

    The way you currently scrape though will take a lot of time to produce results, whereas if you used better the sieve filter you can achieve the same results in half that time and even less.

    Jubei has his own way, I have mine, you have yours, its a matter of taste and how you want to do things...I prefer to have sieve filters in place that I already know they work. And I have as many as possible. Saves me a lot of time. What took me a week to produce before can take me now a couple of days. Sieve filters can be a major time saver..

    Last edited by kensai; 09-06-2011 at 17:56.

  5. #25
    Deckhand tmssj2000 is an unknown quantity at this point
    Join Date
    28th Apr 2011
    Posts
    43
    Thanks
    4
    Thanked 1 Time in 1 Post
    Do you mind sharing your sieve filter, kensai?


  6. #26
    Veteran balas is an unknown quantity at this point
    Join Date
    18th Mar 2011
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I have been playing for a month with sieve filters and additional keywords using my 50k wordlist (added from ebooks). Last week I scrapped ~230k phpBB, vbulletin and PunBB and today finished xrumer (7,10) blast - result is ~11k profiles after first run. I will do second run and then check for public viewable results. I do not see those numbers impressive.

    My sieve filter for phpbb:
    /forum/
    /forum2/
    /ucp.php?mode=register
    /ucp.php?mode=login
    /ucp.php?mode=register
    /memberlist.php?mode=viewprofile
    /forum/index.php
    /viewtopic.php?f=
    /search.php?search_id=active_topics
    /memberlist.php?mode=viewprofile&u=


    My addwords for phpbb:
    /forum/
    /forum2/
    ucp.php?mode=register
    ucp.php?mode=login
    inurl:"viewforum.php?f="
    inurl:"memberlist.php?mode=viewprofile"
    inurl:"/ucp.php?mode=register"
    inurlhpbb
    inurlhpbb2
    inurlhpbb3
    "Powered by phpBB"
    "2000 - 2011 phpBB Group"
    "2000 - 2012 phpBB Group"
    "2009 phpBB Group"
    "2010 phpBB Group"
    "2011 phpBB Group"
    "2000, 2002, 2005, 2007 phpBB Group"


    Whan am I doing wrong? Any tips to increase the success rate of my harvests? Does checking the TIME option in hrefer from ANYTIME to LAST *** helps?(did not check it to compare results yet)?

    Thx


  7. #27
    Shaman adhoc has a spectacular aura about adhoc has a spectacular aura about
    Reputation:
    172

    Join Date
    29th Mar 2011
    Posts
    429
    Thanks
    26
    Thanked 172 Times in 103 Posts
    I wouldn't really say anything is wrong but...

    Get rid of those inurl: search operators in your footprints... use all textual footprints. Actually, get rid of everything relating to the URL in your footprints, that is really what your Sieve filter is for.

    I personally don't use the sieve filter (unless targeting something very specific), a good set of footprints can produce quality harvests.

    Use the first blast, on register only, as your base- then copy the success as a new "optimized" list for further use.

    Don't be discouraged by the first run, learn, tweak, and retry. If your after profiles alone, 11k confirmed profiles means you need to get rid of about 215k URL's. Then that list will only take about 10-15 minutes to run. After you abuse it like a dirty whore you'll see the time spent to gather and break it down was worth it.

    And so you smell what I'm cookin with the all textual footprints...
    phpbb
    Public Profiles
    806m - "avatar" "find all posts by" "All about" "joined:" "posts per day]"
    110m - "of total / " "posts per day]" "private message:" "interests:"
    28.5m - "Member" "Viewing profile ::" "Occupation:"

    Is 11k confirmed forum accounts not impressive or the fact it could only confirm 11k out of 230k?


  8. #28
    Big Boss kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all kensai is a name known to all
    Reputation:
    635

    Join Date
    8th Dec 2010
    Posts
    2,221
    Thanks
    378
    Thanked 638 Times in 309 Posts
    Also, make sure that you dont repeat your footprints, as you do now, for example:

    /ucp.php?mode=register
    /ucp.php?mode=login
    /ucp.php?mode=register


    There is no need for all those 3 footprints for filtering purposes. If you want to filter the list, just include the "root" of that footprint, that is:

    /ucp.php?mode=

    And yes, ditch the inurl, that is a MUST. Only google understands it, and it will just ban your IP after you scrape a few urls as Google knows that 99% of the people using that operator are *cough* Spammers *cough*

    You will save yourself a lot of queries and time if you keep things as simple and as "lean" as possible, without including similar footprints.

    Last edited by kensai; 01-02-2012 at 17:32.

  9. #29
    Veteran balas is an unknown quantity at this point
    Join Date
    18th Mar 2011
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Will test it, many thanks


  10. #30
    Donor llookk is an unknown quantity at this point
    Reputation:
    9

    Join Date
    19th Jan 2012
    Posts
    80
    Thanks
    4
    Thanked 9 Times in 6 Posts
    I read all the thread three times a couple days ago and i tried all suggestions there to my scrape project.
    With inurl: operator in additive words i got 570 my requested results what is good for my request but with suggestions what i got from this thread (like CMS=identification in additive words and inurl in sieve filter) i got 177 my requested results.
    So will try later it to scrape forums with sieve filter. but i'll never use sieve filter again for my inurl action if i will scrape urls for this additive words working better, maybe theoretical is right use sieve filter but not practical, also don't care proxy ban i have fresh proxys every hour.



 

Visitors found this page by searching for:

powered by expressionengine inurl forummemberregister

xrumer forum footprints

powered by expressionengine member photo does not exist

powered by SMF engine light

forum footprints

inurl:forum intexthtml code is on

inurl: forum intext:html code is on

inurl: html code is on

list of forum footprintspowered by expressionengine member registerfootprint list hreferhrefer expressionengineinurl: html code is on .compowered by SMF 2.0 the best internet search enginespowered by SMF 2.0 engine lightpowered by SMF 2.0 fact or opinion examplespowered by SMF 2.0 check enginepowered by expressionengine forums registration pr 3hrefer footprint extractorpowered by expressionengine inurl forums member registerinurl:forum intext:html code is oninurl: forumpowered by SMF 2.0 no check engine lighthrefer sieve filter listinurl:forum powered by smf coffee

Tags for this Thread