Is your network utilization different from year to year? How about day to day or even minute to minute? There are virtually no network caretakers that believe their network remains consistent over time.
Network traffic is unpredictable. A number of factors enter in to causing this unpredictability. But the net impact is that an effective network administrator needs to understand the impact of changing network traffic. This article shows you the common pitfalls that network administrators encounter as the analyze existing and new network capacity. You’ll also see what the impact of the changing traffic patterns can be, and the tools that you can use to both understand and plan your network effectively.
In particular you will learn how network impact is not just measured in traffic volume. This common misconception leads to far too many network utilization issues. Often network latency and dropped packets are more likely caused by the types of traffic and how they are handled. But there are straightforward methods to identify this situation and enable you to choose a resolution that fits your requirements.
All Traffic is Not Created Equal
You’ve watched video over the Internet. But have you ever wondered how the video renders seamlessly in your browser? We take it for granted that once the video starts it plays smoothly and without major interruptions. Delays and “rebuffering” errors have decreased exponentially over the last few years as a direct result of three factors: increased bandwidth, lower latency, and better codecs.
Under the covers, Internet-based video can consume a ton of bandwidth. It is easy for a network administrator to blame video content for poor network performance and block it from her infrastructure. But video isn’t really to blame. Most codecs and applications treat video transfer as low-priority traffic.
Now consider the use of PC-based video and audio conferencing systems like Go To Meeting or Microsoft Live Meeting. How is it that the performance for these applications remains acceptable even when the network is very busy? It is a result of the previously mentioned factors and also that network traffic with a higher business value or less tolerant to network interruption is prioritized higher than other traffic. This type of behavior is usually referred to broadly as Quality of Service (QoS) and is implemented in most network components and hosts.
For example, a network switch that encounters both Microsoft Live Meeting and YouTube traffic at the same time typically gives preference to the Live Meeting traffic because this traffic carries time-sensitive information such as interactive audio and video conversations. The switch may delay or drop the YouTube traffic if necessary to preserve the QoS of the Live Meeting stream. While this is probably a good thing, failing to account for this behavior during planning and operations causes confusion in capacity management and network performance analysis. A network manager might believe that a highly utilized network is impacting critical business functions when it actually is not. The best way to find out, as you’ll see, is to test it out.
Understanding Traffic Impact on Infrastructure
Most networks have a variety of traffic including video, audio, large data transfers, small data transfers, etc. Each of these uses, and indeed each application, may have a different expectation of network performance. Voice over IP (VoIP) applications expect very low network latency, and streaming video applications account for higher latency and dropped packets. While this statement seems obvious, it is different for each network, and does not actually apply to some implementations.
Your business places an importance on different traffic types that is unique to your business model, users, management, etc. The use of different traffic types and the requirements for the performance of each type are the two components that should guide your network analysis and planning.
Notice that I didn’t mention traffic capacity. Actually, raw network traffic capacity is becoming less of a concern in today’s infrastructure. Surplus capacity is prevalent. But that doesn’t mean everything works perfectly.
Testing Networks with Accurate Traffic Simulations
To actually find out whether networks are handing your unique requirements as expected, you need to test them with the kind of traffic that is in use on your network today. Throwing random packets onto the wire is fun, sure, but the test results are valid only when all users throw random packets onto the wire. As you can guess, very few networks are like that.
For example, a common network traffic analysis might break down to look something like this:
Your network testing needs to account for this pattern. Luckily there are two ways you can make this happen: replaying recorded network traffic and simulating the traffic types. This is the only way to accurately determine how far you can push your infrastructure before it fails to meet expectations. Accurate testing is also the only way to really determine whether a vendor’s product will meet your needs, regardless of stated product specifications.
Before we get into the specifics of network testing procedures we need to define the network you will test on. The test network must accurately represent the production infrastructure. A common misconception is that the test network must be a 1:1 representation. The same components must exist if the test requires them, for example a switch, a bridge, a stateful inspection firewall, and so on. But a test network is nearly always a smaller scale. In my experience I’ve only seen two test networks that were 1:1 mirrors of their production counterpart, and these were created for very specific reasons.
Network Replay Testing
This option is the most widely used because it has been available for years and is largely unchanged. To test, the network administrator records a sample of network traffic using a network monitoring device or software package such as Wireshark or Network Monitor. The sample needs to be large enough to accurately represent use of the network over time so it should be at least a day’s worth and recorded from more than one point in the network.
The recorded traffic sample is then taken to an isolated test network and replayed onto the wire. Performance is measured at the appropriate points such as the client, the switch, the Internet connection point, and so on. This test validates the network performance under as accurate a load as possible—it is really what happened on the network.
Most network administrators find value in further testing the network by performing a stress test. This test is actually very simple to implement with the recorded traffic:
- Use the same infrastructure and recorded traffic as the performance test
- Replay the recorded traffic from multiple systems simultaneously
- Continue increasing the systems replaying traffic until something breaks
- Analyze the break to learn the maximum capacity threshold, point of failure, and then predict possible future breaks
The chief drawback to using recorded traffic to test network performance is that some traffic is time-bound and dynamic. As a result it will not accurately represent the network. One example of this is a momentary network condition that corrupts some traffic and causes applications to receive out of sequence packets. The error recovery is very difficult to create using replay. Another challenge is time-stamped data. Many applications ensure that data is not compromised by an attacker by time-stamping it, and that traffic will likely not be valid during a replay attack. This could cause applications to both drop the suspect traffic and decrease network load, or go into a retry state and increase network load.
Network Traffic Simulation Testing
Simulating network traffic using a hardware or software traffic generator is both an old and new approach. Applications have been around for some time that randomly create valid network packets and put them on the wire. The packets can be somewhat customized to carry specific payload sizes; require routing, and other basic choices.
Previous network traffic generators were limited in what traffic they could generate and transmit. This was often due to limited computer or device computing power or network interface restrictions. As a result the older traffic generators were unable to truly flood a network segment or link. While this could still be useful for testing normal load, it could fall short of truly stress-testing a network or infrastructure. Another limit was the interface and usability of the actual appliance, which frequently required in-depth training to deploy correctly.
Modern network traffic generators are far more useful than their predecessors. They are streamlined to take advantage of more computing and network interface resources and are more efficient than ever before. Commonplace operating systems like Windows have also opened up resources to applications that require this level of performance while maintaining a common look and feel. The result of these advancements is application-style software that runs on Windows and is simple to use, yet can generate any style and volume of traffic, from small quantities of pseudo-random data to enough QoS-specific data to make even the most robust network break down and expose the weak points.
The network traffic generator is particularly useful when you need to conduct realistic testing of unique network applications. It is often used in conjunction with functional testing. For example, to validate that VoIP performance is not impacted by other network traffic, you can perform this test:
- Setup a test network with two VoIP clients, a VoIP server, and the intermediate network infrastructure and configuration that represents your environment
- Establish a VoIP conversation using the same software or devices that are used in the production environment
- Introduce a random traffic generator that creates common traffic on the same segment at the same volume as your typical network load
- Turn up the traffic volume and packet size until the network is nearly saturated
In this test the VoIP traffic is typically prioritized higher than the traffic coming from the generator. The expected result is that the VoIP conversation remains clear while the network infrastructure avoids jitter or packet loss for the VoIP packets. Other packets will likely drop or become delayed, but that’s fine. It is the higher priority traffic that we are testing.
A Word of Caution
You surely already understand that a test network must be completely isolated from the production environment. This is not unique to network bandwidth testing—pretty much any kind of stress test must be isolated to ensure it doesn’t impact the rest of the IT infrastructure. But network playback and traffic generation testing evoke a somewhat unique response.
Almost any firewall will slam shut and set off every alert it has when you begin these tests. It doesn’t really matter whether the firewall runs on Windows, a router, a switch, a dedicated filter, or anything else. Firewalls are designed to detect and prevent exactly what you’re intentionally doing during these tests. From a security view these tests often look identical to a denial of service (DoS) attack, many of which have profoundly impacted businesses.
For that reason you will probably need to disable all firewalls in the test environment. And if, somehow, the test network is connected to the production network, you can expect a rapid and wholly unpleasant response to your test.