Guide: A Complete Review of Network Monitoring Use Cases
Park Place: Professional Services
Table of contents
Preventing network downtime: the root of all network monitoring • All organizations need broad coverage across the IT infrastructure.
Network Monitoring Use Cases
Inventory cataloguing • Resource monitoring • Spare port monitoring • Capacity planning • Anomaly in fault detection • Root cause analysis (or, at a minimum, noise reduction)
Types of Monitoring Technologies
Preventing Network Downtime: The Root of All Network Monitoring
“The best networks are ones that are completely invisible.”
-Matthew Latham, Senior Product Manager, Park Place Technologies
While mastering network monitoring and network management use cases is extremely valuable, it is important to first think about network performance monitoring from a higher level.
In an ultimate sense, preventing network downtime is the prime and most fundamental use case for network monitoring. According to Gartner, the average cost of network downtime is $5,600 per minute.1 Ideally, it is best to prevent the network from going down at all, and if it does, you should be able to get it back up as quickly as possible to minimize the cost to the business. Consider the following costly examples:
- In August 2016, a five-hour power outage in an operation center caused 2,000 cancelled flights and an estimated loss of $150 million for Delta Airlines.
- In March 2019, a 14-hour outage cost Facebook an estimated $90 million.2
A network monitoring system is not necessarily designed to fix problems. Rather, it is designed to empower operators with relevant information, as network problems themselves are typically localized. If network operators leverage network topology mapping and know where the problem is, have enough corroborating evidence to evaluate the cause, and can decide how to address it, they are significantly further ahead than they would be without a quality network monitoring solution.
Network performance and availability problems in the data center tend to have widespread impact, but unreliability in office or campus networks can majorly inhibit an organization’s productivity as well. For the duration of the COVID-19 pandemic and beyond, the importance of remote work should not be underestimated. Services like VPNs should never be overlooked when planning for or deploying network monitoring solutions.
As we like to say here at Park Place Technologies, “The best networks are ones that are completely invisible.” In other words, end users and employees alike should have no reason to know the network is there because it is working as it should. These levels of user experience, uptime, and performance can only be achieved with a tool that is constantly monitoring the health of all elements of the network.
Finally, although tempting as they might be given the low purchase price, it is worth noting that fully open-source network monitoring systems often require significant investments of time and effort to install, configure, and maintain. Balancing the feature set with the realistic cost of ownership is essential when making investment decisions.
- Gartner Blog Network, “The Cost of Downtime,” July 16 2014. https://blogs.gartner.com/andrew-lerner/2014/07/16/the-cost-of-downtime/
Network Monitoring Use Cases
Network Device Discovery
Network device discovery (including the resources within that equipment) is the most fundamental network monitoring use case. Unless you understand what you have on the network with a high level of detail, monitoring is either impractical or unreliable.
After the initial discovery, the next step is to figure out what the devices are, how they have configured resources, and what they have on them. It is critically important to uncover this information before starting to monitor those resources, which include things like:
Learn more about how Entuity’s network device discovery software functions.
Network equipment discovery is a natural precursor to inventory cataloguing, which allows you to maintain a vision of what the network comprises (not necessarily what it is intended to comprise). This establishes real time, live monitoring of the equipment on the network.
Although discovery and monitoring may seem to be set-up-oriented steps, they are important. The simple steps of listing, reporting, and exporting details of the network equipment and the resources within allows you to move on to additional forms of monitoring.
Resource monitoring keeps track of various resources such as ports and processors.
The available bandwidth for transmitting and receiving data has finite constraints. The actual traffic flowing in and out of ports is relevant, as is the percentage of the maximum so that you have a clear idea as to how close you are to the edge of available capability. If you’re looking to strengthen you grasp on resource monitoring, learn how Entuity’s network bandwidth monitoring software functions.
It is very important to examine inbound and outbound channels separately on a full duplex interface (most interfaces are full duplex). The amount of traffic flowing inbound to an interface in a full duplex circuit is not related to the amount of traffic and the availability of bandwidth flowing outbound. On a half-duplex circuit, the two are related; on a full duplex circuit they are not. Therefore, they must both be looked at and monitored independently. In addition, it is advisable to exercise extreme caution if you are considering merging the two together.
- For example: Imagine a scenario wherein you are considering combining in and outbound traffic on a full duplex circuit and reporting average utilization. While tempting, it is ill advised. If there is a significant difference between inbound and outbound traffic (which is likely), it is easy to mislead yourself as to how much head room there is on the circuit when the two are merged.
Other qualifying resources include elements like processors and their CPU utilization, normally represented as a percentage of maximum available memory on a device (how much is being used, and therefore how much headroom remains before memory runs out).
Spare Port Monitoring
Various other aspects of network devices interact with the rest of the network ecosystem, one of the most important and often overlooked being spare port monitoring.
- For example: Just because you have an access layer switch with 48 ports on it does not mean that all 48 ports are in use.
Being able to reliably monitor how many ports are being used and how many are not adds another helpful dimension to capacity planning as it relates to determining where and when additional hardware is required. This especially applies to the edge of a network, allowing you to base decisions to acquire new equipment on how it is being used and spare capacity rather than how many connections are needed.
It is also extremely important to recognize that just because a port is not in use does not mean that it is not reserved for a particular purpose.
- For example: A port could be connected to someone’s office. The port may be operationally down, and that employee is out for the afternoon; this does not mean it is available for someone else to use.
At the end of the day, a network is a very large collection of resources which are many and varied; we have mentioned some of those on a typical network device like an ethernet switch. There is a broad array of resources such as devices, servers, storage, disc volumes, and more – all of which are part of resource management.
Network monitoring and capacity planning have an adjacency, as the information being gathered as part of network monitoring can flow over into the capacity planning data feed; this should not be overlooked. Network monitoring is the constant collection of very valuable information. When that information is treated appropriately, other disciplines can benefit from it.
Anomaly and Fault Detection
When monitoring a live production network, the ideal is for most equipment to operate normally, adequately, and left to its own devices (no pun intended). Anomaly and fault detection is focused on where anomalies are occurring.
- Absolute faults (scenarios wherein something has gone completely offline, or you have problems with degradation of service)
- A proportion of data packets on a circuit are corrupted or going missing, disrupting and diminishing performance on that part of the network
- A disk volume has run out of room
- A database server allows a finite number of simultaneous connections which have been maximally used, preventing new connections and causing application failures
All anomalies should be identified one at a time, classified, assigned the appropriate level of severity, and passed on to the network operators who can respond. Not every problem can be handled by the same team. See how network fault management software might help you at this step of the process.
If your organization is heavily siloed and divided up by function (for instance, servers, virtualization, storage, LAN networking, WAN networking, etc.), many teams with their own specialty will be involved. Consequently, it is critical to ensure the monitoring system can communicate these anomalies to the properly equipped teams. Feeding disc space warnings to the network team, for example, is less likely to be productive and more likely to result in team members investing the time to ultimately conclude that the problem has nothing to do with them.
Root Cause Analysis (or, at a Minimum, Noise Reduction)
One of the biggest problems in the whole monitoring industry is that looking at anomalies can produce an inordinate amount of noise and falsely label each anomaly as an event, many of which are irrelevant and unlikely to help a network operator determine a course of action. Simply put, if looking at information coming from the monitoring system cannot help anyone make decisions, it is of questionable value.
More specifically, the majority of alerts must be relevant, otherwise it is difficult for network operators to focus on the information that matters. Therefore, it is essential to ensure the event stream is conditioned such that irrelevancies are removed. This is a key aspect of how events in anomaly detection processing should be carried out.