New Community Website

Ordinarily, you'd be at the right spot, but we've recently launched a brand new community website... For the community, by the community.

Yay... Take Me to the Community!

The Community Blog is a personal opinion of community members and by no means the official standpoint of DNN Corp or DNN Platform. This is a place to express personal thoughts about DNNPlatform, the community and its ecosystem. Do you have useful information that you would like to share with the DNN Community in a featured article or blog? If so, please contact community@dnnsoftware.com.

The use of the Community Blog is covered by our Community Blog Guidelines - please read before commenting or posting.

Development and Test DotNetNuke Installations and Search Engines

May 06

Comments (5), Permalink

It is quite often that when working on a new version of a site that you will have a development, test, upgrade copy of the site that might be around for a while. It is also possible that if you are working for a third-party that you might stage client sites on your server for a period of time before go-live. At first glance this all seems common place and not something that you would be concerned about. However, that is not the case. Search engines have become overly aggressive in indexing sites, including those that have no direct back links but might have been e-mailed or distributed via similar means. Before I get too far into specific in's and out's on this topic I want to start out with why this is so important.

Why Is "Not Indexing" So Important?

There are actually a number of reason that we want to be confident that our dev/staging sites are not indexed by search engines.

Typically these are non-stable sites and could suffer from errors or similar
Typically an existing site has the desired content and that is where users should be getting their content.
We don't want to be penalized for duplicate content
We don't want to publicly expose our development URL's or systems

The above are just a few of the reasons why we want to keep this content private.

Why Not Just Enable Basic Authentication on the Site?

One of the most common recommendations that you get when it comes to this topic is to require Windows authentication for all test/development domains. Although effective this solution is not always the best for a number of reasons.

It does not properly reflect the production environment so issues could arise in the future
Not possible in shared hosting environments (most cases)
Requires the creation/management of additional accounts outside that of DotNetNuke
Can provide issues for mobile browsers

In some situations you might be able to get by this way but not all.

Blocking the Robots for Good!

So how exactly do we block the robots for good from our test/development installations? Well a two pronged approach is really the best way to go about it.

Add a Restrictive Robots.txt

The first item that we should do for all of these environments is to create a restrictive Robots.txt file that will tell the search crawlers that we don't want to allow them to index our site. To so do simply create a text file at the root of the website named "robots.txt" and place the following content within the document.

User-agent: * 
Disallow: /

This instructs all of the crawlers that you are disallowing it to index all content on the site. However this is only a partial solution as it is only a "suggestion" and some of the crawlers will bypass it. Additionally if you have a site where content has already been indexed this will not remove it from the index.

Modify the ROBOTS meta Tag

The other key driver for search crawlers is the ROBOTS meta tag. By default all pages in DotNetNuke default to a value of "INDEX, FOLLOW" which tells the crawlers that they should index the content on the site AND that they should follow links to other destinations. In production sites this is exactly what we want, however, horrible if a bot finds a link to a test/development site.

To get around this as of right now the only real solid solution I've found is to make a small core modification. This isn't ideal but there is not currently a configuration point for this within DNN. To set a default value for all pages we need to first disable the existing DNN value from showing, then add our own custom value.

First find the following line of code within Default.aspx at the root of the site.

Once you have found this modify it to look like the following, this will stop it from displaying to users.

Once this has been done, on the next line add your custom value telling the crawlers that content is off limits. Similar to the following.

With this completed you are set to go!.

WARNING!!!

It is VERY, VERY, VERY important to note that if you ever transition one of these test/development sites to production that you remember to reverse the above listed change BEFORE you deploy to production. NOINDEX, NOFOLLOW will cause search engines to remove the content from their indexes if it is found. If this happens to your production URL's you will lose page rank!

Closing Thoughts

With these two simple steps you can safely hide your in-progress works from the search engines while still allowing your team easy access to the sites. Just don't forget the warning above! Feel free to share your comments/experiences below.

Products

Solutions

Resources

Partners

Community

About

New Community Website

Development and Test DotNetNuke Installations and Search Engines

Why Is "Not Indexing" So Important?

Why Not Just Enable Basic Authentication on the Site?

Blocking the Robots for Good!

Add a Restrictive Robots.txt

Modify the ROBOTS meta Tag

WARNING!!!

Closing Thoughts

Comments

Comment Form

NewsArchives

ESW Operations, LLC