Products

Solutions

Resources

Partners

Community

About

New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

The Community Blog is a personal opinion of community members and by no means the official standpoint of DNN Corp or DNN Platform. This is a place to express personal thoughts about DNNPlatform, the community and its ecosystem. Do you have useful information that you would like to share with the DNN Community in a featured article or blog? If so, please contact .

The use of the Community Blog is covered by our Community Blog Guidelines - please read before commenting or posting.


Development and Test DotNetNuke Installations and Search Engines

It is quite often that when working on a new version of a site that you will have a development, test, upgrade copy of the site that might be around for a while.  It is also possible that if you are working for a third-party that you might stage client sites on your server for a period of time before go-live.  At first glance this all seems common place and not something that you would be concerned about.  However, that is not the case.  Search engines have become overly aggressive in indexing sites, including those that have no direct back links but might have been e-mailed or distributed via similar means.  Before I get too far into specific in's and out's on this topic I want to start out with why this is so important.

Why Is "Not Indexing" So Important?

There are actually a number of reason that we want to be confident that our dev/staging sites are not indexed by search engines.

  • Typically these are non-stable sites and could suffer from errors or similar
  • Typically an existing site has the desired content and that is where users should be getting their content.
  • We don't want to be penalized for duplicate content
  • We don't want to publicly expose our development URL's or systems

The above are just a few of the reasons why we want to keep this content private.

Why Not Just Enable Basic Authentication on the Site?

One of the most common recommendations that you get when it comes to this topic is to require Windows authentication for all test/development domains.  Although effective this solution is not always the best for a number of reasons.

  • It does not properly reflect the production environment so issues could arise in the future
  • Not possible in shared hosting environments (most cases)
  • Requires the creation/management of additional accounts outside that of DotNetNuke
  • Can provide issues for mobile browsers

In some situations you might be able to get by this way but not all.

Blocking the Robots for Good!

So how exactly do we block the robots for good from our test/development installations?  Well a two pronged approach is really the best way to go about it.

Add a Restrictive Robots.txt

The first item that we should do for all of these environments is to create a restrictive Robots.txt file that will tell the search crawlers that we don't want to allow them to index our site.  To so do simply create a text file at the root of the website named "robots.txt" and place the following content within the document.

User-agent: * 
Disallow: /
This instructs all of the crawlers that you are disallowing it to index all content on the site.  However this is only a partial solution as it is only a "suggestion" and some of the crawlers will bypass it.  Additionally if you have a site where content has already been indexed this will not remove it from the index.

Modify the ROBOTS meta Tag

The other key driver for search crawlers is the ROBOTS meta tag.  By default all pages in DotNetNuke default to a value of "INDEX, FOLLOW" which tells the crawlers that they should index the content on the site AND that they should follow links to other destinations.  In production sites this is exactly what we want, however, horrible if a bot finds a link to a test/development site.

To get around this as of right now the only real solid solution I've found is to make a small core modification.  This isn't ideal but there is not currently a configuration point for this within DNN.  To set a default value for all pages we need to first disable the existing DNN value from showing, then add our own custom value. 

First find the following line of code within Default.aspx at the root of the site.

<meta name="ROBOTS" runat="server" id="MetaRobots" />

Once you have found this modify it to look like the following, this will stop it from displaying to users.

<meta name="ROBOTS" runat="server" id="MetaRobots" visible="false" />

Once this has been done, on the next line add your custom value telling the crawlers that content is off limits.  Similar to the following.

<meta content="NOINDEX, NOFOLLOW" name="ROBOTS" />

With this completed you are set to go!.

WARNING!!!

It is VERY, VERY, VERY important to note that if you ever transition one of these test/development sites to production that you remember to reverse the above listed change BEFORE you deploy to production.  NOINDEX, NOFOLLOW will cause search engines to remove the content from their indexes if it is found.  If this happens to your production URL's you will lose page rank!

Closing Thoughts

With these two simple steps you can safely hide your in-progress works from the search engines while still allowing your team easy access to the sites.  Just don't forget the warning above!  Feel free to share your comments/experiences below.

This article has been cross-posted from my Personal Blog.

Comments

Comment Form

Only registered users may post comments.

NewsArchives


Aderson Oliveira (22)
Alec Whittington (11)
Alessandra Daniels (3)
Alex Shirley (10)
Andrew Hoefling (3)
Andrew Nurse (30)
Andy Tryba (1)
Anthony Glenwright (5)
Antonio Chagoury (28)
Ash Prasad (37)
Ben Schmidt (1)
Benjamin Hermann (25)
Benoit Sarton (9)
Beth Firebaugh (12)
Bill Walker (36)
Bob Kruger (5)
Bogdan Litescu (1)
Brian Dukes (2)
Brice Snow (1)
Bruce Chapman (20)
Bryan Andrews (1)
cathal connolly (55)
Charles Nurse (163)
Chris Hammond (213)
Chris Paterra (55)
Clint Patterson (108)
Cuong Dang (21)
Daniel Bartholomew (2)
Daniel Mettler (181)
Daniel Valadas (48)
Dave Buckner (2)
David Poindexter (12)
David Rodriguez (3)
Dennis Shiao (1)
Doug Howell (11)
Erik van Ballegoij (30)
Ernst Peter Tamminga (80)
Francisco Perez Andres (17)
Geoff Barlow (12)
George Alatrash (12)
Gifford Watkins (3)
Gilles Le Pigocher (3)
Ian Robinson (7)
Israel Martinez (17)
Jan Blomquist (2)
Jan Jonas (3)
Jaspreet Bhatia (1)
Jenni Merrifield (6)
Joe Brinkman (274)
John Mitchell (1)
Jon Henning (14)
Jonathan Sheely (4)
Jordan Coopersmith (1)
Joseph Craig (2)
Kan Ma (1)
Keivan Beigi (3)
Kelly Ford (4)
Ken Grierson (10)
Kevin Schreiner (6)
Leigh Pointer (31)
Lorraine Young (60)
Malik Khan (1)
Matt Rutledge (2)
Matthias Schlomann (16)
Mauricio Márquez (5)
Michael Doxsey (7)
Michael Tobisch (3)
Michael Washington (202)
Miguel Gatmaytan (3)
Mike Horton (19)
Mitchel Sellers (40)
Nathan Rover (3)
Navin V Nagiah (14)
Néstor Sánchez (31)
Nik Kalyani (14)
Oliver Hine (1)
Patricio F. Salinas (1)
Patrick Ryan (1)
Peter Donker (54)
Philip Beadle (135)
Philipp Becker (4)
Richard Dumas (22)
Robert J Collins (5)
Roger Selwyn (8)
Ruben Lopez (1)
Ryan Martinez (1)
Sacha Trauwaen (1)
Salar Golestanian (4)
Sanjay Mehrotra (9)
Scott McCulloch (1)
Scott Schlesier (11)
Scott Wilkinson (3)
Scott Willhite (97)
Sebastian Leupold (80)
Shaun Walker (237)
Shawn Mehaffie (17)
Stefan Cullmann (12)
Stefan Kamphuis (12)
Steve Fabian (31)
Steven Fisher (1)
Tony Henrich (3)
Torsten Weggen (3)
Tycho de Waard (4)
Vicenç Masanas (27)
Vincent Nguyen (3)
Vitaly Kozadayev (6)
Will Morgenweck (40)
Will Strohl (180)
William Severance (5)
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out
What is Liquid Content?
Find Out