Forum Moderators: mack
Personally, bandwith is cheap for me and I'm willing to feed MSNbot as long as it doesn't impact performance for users.
So how much bandwith are you allowing?
On one new site, that was only really finished a couple of weeks ago, the stats are: -
Users transfer: 182.38Mb
Other bots : 162.97Mb
MSNbot transfer : 1783.70Mb
It looks like its pulled the whole of that site twice now, once by msnbot64057 and once by msnbot64058. Other sites its hitting a bit less hard. Hopefully they won't keep it up for too long before launching.
So I wrote to MSN and they said they'd fix it. I haven't had them banned since, so it looks like they're quite receptive to resolving issues.
User-agent: msnbot
Disallow: /
(Since "msnbot" is in the user agent string, this keeps the crawler out.)
I figure they have enough from me.
BTW, my theory is that MSN will push its speedy delivery of cached multimedia files to searchers as its edge over Yahoo/Google when the engine finally rolls out. They are probably collecting and categorizing/keywording as much of that kind of content as they can. We only have 3 multimedia files on our server, but if you hit it 500 times per day, it adds up to a lot of bandwidth.
If MSN acts fast, they can grab significant market share from google by simply providing bookmark quality results like google used to do (remember when you could more or less count on getting consistent results from google, I didn't even use bookmarks for almost a year, now I always bookmark). It's not like google has really improved their quality in the last tweaks as far as I can see.
Regards...jmcc
With regards to aggressiveness of the crawl: we are definitely learning and improving. We take politeness very seriously and we work hard to make sure that we are fixing issues as they come up. For specific politeness issues we recommend you E-mail us at msnbot@microsoft.com. Either myself or someone from the team will take the time to investigate what is happening and send a reply. We will be posting to this forum to answer general questions about politeness, best practices, and the search engine itself. Look forward to chatting with everyone here.
-msn search dudes (msd)
I just put up a three page website 4 days ago and have had 14 msnbot page requests in that time.
since there is no way I'm going to reprogram all my stuff to do proper last modified headers, way too much work for zero payoff on my end, is this something that can be fixed on msn's end?
I definitely welcome the aggressive crawl, although if msn is able to crush google we'll be singing a different song soon enough....
If MSN's search DOES end up crushing Google I'll be glad I was indexed. And even if it doesn't, there will certainly be enough hype about it in the beginning that it will be worth while.
SEO's are going to have a field day making sites index high in both Google AND MSN though.
It would be a good thing if Microsoft would work with webmasters if it wants to make a superior search engine. However most search engines are inherently algorithmic so that claim does not impress people here. :) (Though I still think that the emphasis on personalisation is wrong. The more important thing for most people I talk to about searching is the localisation aspect and the way Google et al are going about it is wrong - people do not search by postcodes.)
Regards...jmcc
Until then I consider your intervention PR and damage control and that's not what I require or expect from you.
I don't apologize for being rude because I had to pay for the privilege of feeding your experiments last month.
Until then I consider your intervention PR and damage control
that's what I consider all these search engine guys who post here, why would you consider it anything else? I'm going to have to go to the store to buy more salt since you have to take everything the employee of a company says about said company's product with large grains of salt, I think I'll go with rock salt this time.
Especially MS, who don't have a very good track record in the PR area. But I still welcome a new search engine, I'd rather deal with 3 fairly equal ones than 1 main one, which is what I have to do now, google is getting boring with their endless tweaks [ hint: msn, make a reliable, stable product, with reliable, stable results, don't do huge algo tweaks if at all possible... nobody cares about search, they just care about getting reasonably decent results ].
Fortunately, bandwidth is cheap for my US based sites, so I just let MSNbot get her fill. Maybe that saved some of you a few bytes. ;)
There is a robots.txt now. So don't assume that I was lazy. Don't make up scenarios when you don't know the full extent of the story.
As for the "rude" part, had I not added that word, you guys wouldn't have perceived my message as rude, just very direct. It's all in the packaging.
However, calling someone "lazy" is rude and disrespectful.
Remember guys, TOS 4 and 19 applies to ALL of our members, regardless of who employs them.
What about items 18 and 20? The search company representatives are not standard posters, they occupy a special position on these forums as far as I can tell, and represent clear commercial interests, but do provide a certain insight into those commercial entities, as long as you bring your salt with you.
It would be slow as crap, but I can't think of any other reason why it would do what it's doing. Think about this... they have a cached snapshot of your page which is used for text queries just like Google but maybe they're going a step further and actually comparing the results to the "live" page to make sure that it hasn't changed since they cached it, and that it's not a dead link.
I don't think this model would be very efficient in a production environment - in fact I KNOW it wouldn't - but I can see some advantages for them to run this way while testing.