This is one of those posts that is always difficult for me to write. Nobody likes to admit failure, but the only way to fix a problem is to admit when it exists. Once again we have had to remove a release due to a severe bug making it through the release process. We probably could have just left the 5.4.3 release as it was and worked on rushing out a new 5.4.4 release to address the bug, however, we felt it would be prudent not to put our users in the position of installing and trying to use a version that we know is going to break a lot of modules from 3rd party developers.
The bug in question is a breaking change that will break any module that uses the core ExecuteSQL data access layer function. It is hard to gauge just how many modules use this function, but we know that this function impacted modules by at least 4 core team developers, including many administrative modules that we use to run DotNetNuke.com. The fix for this bug is relatively simple and we have already checked in a fix and are working to test it. We will have an updated package ready for testing tomorrow by a select group of commercial module developers with a final release of 5.4.4 coming later this week. Given the security issues which were already made public we are limiting the scope of this release to ensure we don’t leave the community overexposed.
This issue highlights one of the challenges of rapid growth in any software development organization. How do you bring new developers up to speed and get them productive as quickly as possible? How do you continue to ensure that your software continues to meet the quality standards that you and your customers expect, even as your engineering team expands and your processes are undergoing change?
So how do we plan to address this challenge? The solution is really one that must be addressed at every level of engineering process – from the developer who first touches the code, to the build manager who is responsible for building the packages, to the QA team who tests the code, to the community beta testers who are performing ad-hoc testing, to the release manager who signs off on the final release. At each phase there are people involved who have some responsibility to ensure we are putting out a product that we can all be proud of. We are currently working with people from each of these levels to improve our processes and our tooling to do every thing we can to eliminate these problems.
Our engineering team is working hard to add more and more unit testing to help catch these types of issues earlier. Part of our challenge here is to make needed architectural changes that will allow us to unit test more and more of the framework. We also recognize that there is quite a bit of cultural and historical knowledge about the platform that we need to pass-on to new members of the engineering team. We had already planned an engineering retreat this summer for the entire Engineering and Product teams and a focus on engineering quality will certainly be a big topic covered that week.
For the last 9 months we have been building out and enhancing our automated build processes. With each release we have improved the process to automate more and more tasks. We know that there are many processes that still remain to be automated. Based on the results of the 5.4.3 release we will be stepping up our efforts to incorporate automated API testing to ensure we are not breaking binary compatibility between releases. This had been on our list of things to do but will now be moved to the top of the pile so we can get this in place in time for the upcoming 5.5 release.
Our internal QA team has been working hard to keep up with the pace of our releases. Over the last several months they have produced and executed well over two thousand feature and regression test cases. Given the current size of our team, we must prioritize the number of tests that are run with each release. We continue to expand this team and will be re-evaluating key testing steps to ensure that we are covering a wide enough set of baseline tests with each release. Based on some of the release issues this spring we will be adding in test cases that exercise installation and upgrade of key 3rd party modules.
One area that we still struggle with is how to best engage the community to make them an integral part of our release process. We have tried closed betas, open betas, open repository and even special advisory committees all in an effort to get code and packaged releases out to the community in an effort to get feedback prior to a release. Nothing we have tried has resulted in adequate feedback. Last November we started posting source code up on CodePlex. By mid-February we had automated this process so that code was made available almost every night as it was checked into our own repository. While we continue to get a substantial number of downloads we do not seem to be getting much response on the code quality.
Likewise, following the release of 5.3 and 5.4 we formed a Module Developer Advisory committee made up of a group of commercial module developers. This group receives access to build packages around the same time that we delivered packages to our internal QA team. We will continue working with this team to find out how we can better identify these types of issues before a release.
Finally, the last person who touches each package before it is released is me. For the last 18 months or so we have been using a release checklist to ensure that we follow the same steps with each release. A large part of this checklist is focused on making sure that the engineering and marketing efforts are complete, that we have performed the necessary QA steps and that we have posted files in the proper locations and updated the web pages which need updating. Up to this point we have not had a product “owner” who would perform any sort of acceptance testing. With the recent formation of a formal product team we will begin incorporating this step into our release processes. Given the size of this team this will not be an all encompassing set of testing like that performed by QA but will include some basic sanity checks to ensure that every part of quality chain is doing their part to help ensure we put out a solid product.
We think that by taking a step back and re-evaluating how we perform every step of the engineering and release process that we can make significant improvements in the quality of our releases. You deserve better from us and we demand better from ourselves.
06/24/2010 UPDATE: During testing of 5.4.4 we found an additional issue that required an additional round of testing which pushed us outside of our release window for the week. We will be releasing 5.4.4 on Monday June 28.