Eleanor Roosevelt said, “Learn from the mistakes of others. You can’t live long enough to make them all yourself.” However when it comes to making changes to your company’s network mistakes can have devastating consequences. This can lead to not only a loss in sales and productivity, but can also erode customer’s brand confidence and market creditability. When it comes to network management the quote should really be, “learn from the mistakes of others. You can’t keep your business long enough to make another one.”
This article identifies the six most common mistakes that network managers make. It explains why these mistakes occur, the risks associated with these mistakes and what you can do about them. It will also equip you with the information you need to avoid them and define a change plan that can produce successful results.
The IT industry is going through a period of major change. Today’s enterprise networks are under pressure to support the growing number of smartphones and Internet enabled devices that are connecting to their networks. These devices are generating huge amounts of rich media traffic. Sensors, web usage analysis, and network monitoring tools are also generating unprecedented amounts of data. Collectively this is referred to in the industry as “big data”. Big data is putting tremendous strain on the capacity, throughput and storage I/O of enterprise networks.
It addition, technologies such as virtualization and utility based cloud computing are already radically changing the way that IT organizations run their business applications. Cloud computing is essentially a group of servers and storage resources connected by a network which is shared between users. Network performance is a critical consideration for cloud computing applications.
IT leaders are making changes to their network infrastructure to manage big data and transition to the cloud computing model. However many leaders do not recognize that the network change procedures and tools also need to be modified. They are assuming that the change procedures that exist today will work tomorrow. Unfortunately this is not the case.
The risks associated with assuming a change methodology that has worked for years will continue to work for your organization in the coming years are immense.
Think of change management procedures as a living document that is evolved and refined as your network evolves. Every aspect of a change management procedure needs to be reviewed every time new technology is implemented, a new facility has been added or a new service provider has joined your network. Change request forms may need to also be changed to reflect previously nonexistent technologies, such as how to change a request of a 3rd party cloud provider’s network. Your planning and approval process may need to be expanded to include where and how the data is accessed. How you schedule a network change may need to encompass third party cloud providers.
A leading cause of downtime in today’s modern networks is caused by a failure to follow change management procedures. This is a common symptom that you will find in business of all sizes. From major corporations that have some of the most rigorous change control methodologies that train all their staff on how to follow the procedures, to small businesses that have a bare bones methodology and perhaps only one or two technical people on staff.
A network change procedure can enable network changes to be made in a schematic and controlled manner. If you look back you will see that history is littered with examples of people that have brought entire networks down due to failure to follow an already defined process.
There can be many reasons why someone does not follow the defined network change procedure. They may consider that they know a better way to accomplish the necessary change, they may not have enough time and they omit to do certain steps, or they may not understand the process. Whatever the reason, here are clear steps that you can take to minimize this risk.
First, make sure your network team is committed to the process and understands the consequences if the procedure is not followed. Second, make sure you are monitoring your network so you have an early detection as to when things are not going as planned. Network management tools allow alarms to be displayed on the screen, or sent via email or text messaging when critical network errors occur, as the one shown in Figure 1. Lastly, make sure that your change plan includes back out and recovery steps should the change fail to have the expected results. Your team needs to know that backing out the changes is essential. Some of the most devastating network failures are caused by workers that decide to take into their own hands how to fix things when the planned changes do not go as planned.
Figure 1: Windows Event Viewer displaying TCP/IP error
People that run day-to-day network operations are typically the same people that develop and implement your network change plan. The work of changing the network is therefore an additional burden to their regular operational work. Despite management’s wishes for change to happen quickly, it is important to estimate not just how many hours of work it will take, but also to set a realistic estimate for the elapsed time that the work will take.
IT organizations are being asked to do more with less. IT departments that load network changes on top of the excessive day-to-day operational tasks are adding employees stress and it resulted in lower performance. The challenge is to free resources from ongoing network operations such that network administrators can carry out the necessary network changes.
There are a number of strategies you can pursue to free up a network administrator’s time. You could stop doing some projects or delay them until after the change is completed. You could outsource part of the day-to-day operations to external companies or contractors. You could move non-critical operational parts of your business to a cloud service provider removing your team from the day-to-day management of non-critical computer resources. You can look to add additional tools automate or to simplify some of the network management tasks. Regardless of the strategy you decide on, it is essential that you allocate the appropriate amount of resources to both the planning and implementing of your desired network changes.
Network operations can find themselves inundated with numerous unconnected and diverse network change requests. Typically organizations take an approach of prioritizing the individual change requests with an associated planning and escalation process. A combination of IT staff reductions and increasing user expectations means that the list of change requests is getting longer and longer. The ever growing list of network changes can seem quite overwhelming, and as a result significant time is being wasted discussing which request should be done first and how to handle escalations and complaints.
Resources can be more effectively utilized if you organize your major network change requests around an organization’s business goal. Aligning your major change requests around a common business goal means that you will have a relevant and measureable outcome. This approach can also help you determine the best sequence in which to implement changes, and result in a better utilization of your resources. It also enables you to stop working on network changes that do not support your business strategy.
In other words, taking a holistic total network perspective that supports the business vision and strategy, enables you to make better decisions, and more efficiently leverage resources and set realistic timelines. It will reduce redundant work, accelerate your ability to respond and demonstrate your impact on the success of your business.
A key part of network change management is the definition of roles and responsibilities. Who is responsible for approving the plan? Who will carry out the necessary network changes? Who is responsible for make decisions? Who will communicate the changes? Who will interface with network users? All these roles and responsibilities need to be clear from the onset.
Today’s fast paced IT organizations often assume that if they hired the people with the right skills that they will know what to do when it comes to network upgrades and changes. This approach results in a slowdown of progress. People are hindered from moving forward by team politics, uncertainty about decision making, and lack of clarity about who is authorized to do what.
You must define clear roles and responsibilities for everyone on the team, and how these roles interrelate to each other. There needs to be clarity on decision levels and authority. It is also advisable to define how those in a leadership and decision making capacity will interface with the operational staff.
Remember that if your day-to-day operational staff is also responsible for making the network changes, then they essentially have two jobs. In this situation you need to define roles and responsibility for two very different organizational positions. Clarity between these two positions is essential.
Some of the most costly examples of failed network changes occur when IT experts do not engage network users. If you are implementing network enhancements without a clear understanding of how users plan to use the new capabilities, there’s a high probability you will get it wrong. For major network changes it is important to ask end users and stakeholders what their expectations are. Allowing end users and stakeholders input into decisions will result in better solutions, greater clarity on what success looks like, and it will improve their support for the subsequent implementation.
It is possible to engage end users and stakeholders without using a considerable amount of resources. Numerous online social networking services such as blogs, webcasts, Facebook and Twitter can be used to share your plans and to solicit input. These tools make it so easy to ask users what they want that every company should take advantage of them. Regardless of the types of media that you plan to use to communicate with your stakeholders, remember that the communications needs to be informative and accurate.
So you have followed this advice. You have updated change management procedures for your network, and your team is committed to using these procedures. You feel confident that you are producing good work estimates and timelines, you are aligning the changes to the overall business goals, the team’s roles and responsibilities are clear, and you are actively talking to the end users. You wonder what is next. The answer is metrics, metrics and more metrics.
There are three primary areas you should be measuring. First, you need to know how well the network is working. To do this you will look at availability, latency, packet loss, retransmission and throughput. Figure 2 shows how you can filter out TCP retransmitted packets using network monitoring tools. Did the changes you made to the network improve the network performance in these four key areas?
Figure 2: Microsoft Network Monitor capturing retransmitted TCP packets
Secondly, you will need to know if the implemented network changes achieved the expected impact on the business goals. These metrics are goal specific. For example if your business goal was to improve customer satisfaction you would look at end user metrics such as web logs, real user monitoring, or simply run an end user survey. Lastly, you need to assess the effectiveness of your planning and change management processes. You should run post mortems and identify areas for improvement.
This article started with a quote from Eleanor Roosevelt and ends with a quote by Guy Almes, a former chief engineer for Internet2. This quote encapsulates how important network connectivity has become and the risk associated with network changes. “There are three kinds of death in this world. There's heart death, there's brain death, and there's being off the network.”