Learn More





DNN Community Blog

The Community Blog is a personal opinion of community members and by no means the official standpoint of DNN Corp or DNN Platform. This is a place to express personal thoughts about DNNPlatform, the community and its ecosystem. Do you have useful information that you would like to share with the DNN Community in a featured article or blog? If so, please contact .

The use of the Community Blog is covered by our Community Blog Guidelines - please read before commenting or posting.

Rough to get Slashdotted? Try getting “SharePoint’ed…

There was a time when DotNetNuke didn't gather a lot of site statistics.  Very early on, we used our own internal site stats features to keep some relevant information but as most hosts know, that architecture is not terribly efficient on a massive scale.   But as we have grown and continued to add interactive features to the site ( heavily used forums, active blogs, information repositories, benefactor & vendor management, etc )… our need to understand our site usage at a more detailed level has increased dramatically.  So when we migrated to a new web server before the holidays, we finally got around to installing some log analysis software.

Like many organizations, we weren't too concerned about getting indexed.  In fact, we’d normally consider it a pretty good thing to get found & indexed when you are still a growing organization in search of broader uptake.  But then we took a look at our first week of recorded site traffic…

Oh, my.

So let’s take a look at what we were facing in December.  The two screen shots below represent (1) spider traffic on and (2) IP traffic.  Look at these and let’s see what we can notice.


So who’s that “MS Search Robot” that soaked up 495GB of data transfer in December?  Not to mention 7 ¾ million page views on 393 visits?


Hmm… and who are those IP addresses with just a few visits and millions of page views… and ( by the way ) responsible for more than half of our bandwidth consumption?

If you go google’ing for “MS Search Robot”… you’re not going to find much except for a bunch of other people asking, “hey, what’s this MS Search Robot”?  But if you check your raw server logs it looks a little bit different… “MS Search 4.0 Robot”.  Keep digging and you’ll find some obscure references ( mostly circa 2003 ).  But the one you’re really looking for is here:;en-us;284022#XSLTH3163121123120121120120

You might notice the fine print there…

IMPORTANT: Limit the number of site hops to the absolute minimum number necessary. When you perform an Internet crawl, you might index millions of documents in just a few site hops.

Yep.  We can validate that.

Turns out… as we had the opportunity to follow up with some of these IP address owners ( some representing large companies, professional organizations, etc ) we quickly discovered what was happening.  All of them were quite happy to work cooperatively with us, often not even aware of the load on their own systems being generated.

Local SharePoint installations ( some development, some production ) were crawling their internal networks.  Within their internal networks they had ( one or more ) default installations of DotNetNuke… each of which contains front page links back to ( i.e. for the information Links and Sponsors modules ).  SharePoint ( without specific inclusions defined ) was just following links… So what else did that fine print say?

The site path rule strategy that is recommended when you are crawling Internet sites is to create an exclusion rule for the entire HTTP URL space (http://*), and then create inclusion rules for only those sites that you want to index.

Oh yeah.  Basically that the default settings are a little impolite to other sites and that you should change them.  Please make a note.  *grin*

Now that we have the proper exclusions in our robots.txt file ( see below )… we’re no longer being hammered by this particular bot.  Are you?

User-agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft
Disallow: /


Comment Form

Only registered users may post comments.


2sic Daniel Mettler (124)
Aderson Oliveira (15)
Alec Whittington (11)
Alex Shirley (10)
Andrew Nurse (30)
Anthony Glenwright (5)
Antonio Chagoury (28)
Ash Prasad (21)
Ben Schmidt (1)
Benjamin Hermann (25)
Benoit Sarton (9)
Beth Firebaugh (12)
Bill Walker (36)
Bob Kruger (5)
Brian Dukes (2)
Brice Snow (1)
Bruce Chapman (20)
Bryan Andrews (1)
cathal connolly (55)
Charles Nurse (163)
Chris Hammond (203)
Chris Paterra (55)
Clinton Patterson (28)
Cuong Dang (21)
Daniel Bartholomew (2)
Dave Buckner (2)
David Poindexter (3)
David Rodriguez (2)
Doug Howell (11)
Erik van Ballegoij (30)
Ernst Peter Tamminga (74)
Geoff Barlow (6)
Gifford Watkins (3)
Gilles Le Pigocher (3)
Ian Robinson (7)
Israel Martinez (17)
Jan Blomquist (2)
Jan Jonas (3)
Jaspreet Bhatia (1)
Jenni Merrifield (6)
Joe Brinkman (269)
John Mitchell (1)
Jon Henning (14)
Jonathan Sheely (4)
Jordan Coopersmith (1)
Joseph Craig (2)
Kan Ma (1)
Keivan Beigi (3)
Ken Grierson (10)
Kevin Schreiner (6)
Leigh Pointer (31)
Lorraine Young (60)
Malik Khan (1)
Matthias Schlomann (15)
Mauricio Márquez (5)
Michael Doxsey (7)
Michael Tobisch (3)
Michael Washington (202)
Mike Horton (19)
Mitchel Sellers (28)
Nathan Rover (3)
Navin V Nagiah (14)
Néstor Sánchez (31)
Nik Kalyani (14)
Peter Donker (52)
Philip Beadle (135)
Philipp Becker (4)
Richard Dumas (22)
Robert J Collins (5)
Roger Selwyn (8)
Ruben Lopez (1)
Ryan Martinez (1)
Salar Golestanian (4)
Sanjay Mehrotra (9)
Scott McCulloch (1)
Scott S (11)
Scott Wilkinson (3)
Scott Willhite (97)
Sebastian Leupold (80)
Shaun Walker (237)
Shawn Mehaffie (17)
Stefan Cullmann (12)
Stefan Kamphuis (12)
Steve Fabian (31)
Timo Breumelhof (24)
Tony Henrich (3)
Torsten Weggen (2)
Vicenç Masanas (27)
Vincent Nguyen (3)
Vitaly Kozadayev (6)
Will Morgenweck (37)
Will Strohl (163)
William Severance (5)
Try Evoq
For Free
Start Free Trial
a Demo
See Evoq Live
Need More Information?