Network Telemetry Guide – Applications, How it’s Deployed and Challenges
Entuity Software
Computer networks can get so complex that problematic behaviors may be difficult to diagnose, much less predict or prevent. For some network managers, slowdowns and outages seem to come out of nowhere. There’s always a reason for network trouble, the challenge is to know what the problem is and remediate it — ideally before it balloons into a crisis that brings C-level attention.
Network telemetry, which involves the collection and analysis of network data, offers a solution. It’s a growing area of technology investment, now representing $704 million in annual spending, but with a growth rate of 39% per year.
This article explores “What is telemetry in networking?” It explains how networking telemetry works and how it’s deployed, along with challenges that arise with its implementation and what can be done to mitigate them.
Fact Snippet
- Network telemetry definition – Network telemetry concerns the processes involved in collecting, analyzing and monitoring network data, providing real-time insights into network health and performance.
Jump-to Section
How Network Telemetry Differs from Network Monitoring and Observability
Why Network Telemetry is Growing in Popularity
Typical Applications of Telemetry in Networking
How Is Network Telemetry Deployed?
2. Transport and Measurement Protocols
3. Receiving Network Telemetry
4. Storing, Analyzing, and Presenting Network Data
3 Challenges of Network Telemetry
3. Balancing Pro-Active with Reactive Monitoring
What is Network Telemetry?
Network telemetry is a collection of technologies and practices that enable insight into network performance and behaviors. It comprises the remote generation, collection, correlation, and analysis of network data, to facilitate network efficiency, reliable performance, and automated management of networks.
Network Telemetry has a specific use case, but it is not an isolated activity. It typically occurs alongside other systems for monitoring IT infrastructures, such as application performance monitoring (APM) tools. It may integrate with these systems because it is often necessary to examine factors beyond the network to determine why a network is not functioning properly. For instance, a compromised endpoint might generate irregular network traffic, but the problem is may not be in the network, for instance.
How Network Telemetry Differs from Network Monitoring and Observability
Network telemetry is similar to network observability and monitoring but with some distinct differences. The processes overlap and interact with one another, but they should not be conflated.
Telemetry network monitoring systems collect data remotely across the network. Monitoring leverages this data to detect network performance issues in real time. Network observability also uses telemetry data to interpret and troubleshoot network problems. This can be a confusing subject to investigate because a “network telemetry solution” might in fact perform monitoring and observability functions.
Why Network Telemetry is Growing in Popularity
Telemetry in networking is growing because modern networks are complex and difficult to manage without real-time, data-driven insights into how they are functioning.
Network managers need the kind of enhanced visibility provided by network telemetry. It gives them a deep understanding of network traffic and health and allows them to optimize network performance and solve problems proactively. It helps them remove bottlenecks and ensure high performance, perhaps in keeping with service level agreements (SLAs).
New network models, such as software-defined networks (SDN), also create a need for in-depth analysis of network data in real time. At the same time, increasing network vulnerability to cyber threats makes it important to have accurate data about network anomalies that could signal the presence of a threat.
Typical Applications of Telemetry in Networking
How do enterprises put network telemetry to work? Below are some of the most common examples:
- Diagnostics and troubleshooting — Network streaming telemetry data gives network managers a mix of history and real-time data about network performance. This enables them to perform diagnostics and troubleshoot network problems by identifying root causes of network issues.
The data offers a way to isolate faults by analyzing traffic and logs. With this knowledge, they can proactively triage network segments that are affected by threats and other disruptive forces.
- Monitoring performance — With network telemetry in place, network managers can stay on top of bandwidth utilization and discover bottlenecks that are affecting network performance. They can analyze bandwidth usage patterns at the packet level and spot the causes of latency and packet loss.
- Monitoring security — Network anomalies are often the earliest indicator of a cyberattack. For example, a usage spike might indicate a Denial of Service (DoS) attack. Similarly, traffic patterns can reveal malicious activity, provided that network telemetry data gathering and analysis are configured for this purpose. And, again, telemetry data can inform proactive triage moves that defend the network from malicious actors, which if not stopped could lead to network data loss.
How Is Network Telemetry Deployed?
Network telemetry has evolved into a set of patterns and practices that vary in their details but generally follow the same path. Whatever the framework, at its core, it is a process of data collection and analysis.
Over the years, experts have developed methods to pull network data from multiple sources, integrate them where necessary, and apply specialized analytical processes to gain the maximum benefit from network telemetry. Below are the key aspects of deployment.
1. Data Sources
Network telemetry tools rely on vast data feeds that come from multiple sources. Most frameworks are comprised of a few top-level modules. Each serves a distinct purpose.
- Management plane—Where network elements interact with the network management system (NMS) using protocols like syslog and simple network management protocol (SNMP).
- Control plane—Where telemetry monitors the network’s control elements.
- Forwarding plane—Where the telemetry solution forwards data from the network to places where it can be stored and analyzed when it meets the appropriate quality and timing standards, e.g., data that’s new enough to be “real-time.” This plane often uses specialized hardware, such as forwarding chips, to ensure fast data delivery.
- External events module—Which integrates external data sources into the telemetry analytics solution, provided it meets quality and timing standards.
Each module includes functions and configurations to query, store, and analyze telemetry data. This may involve data filtering, normalizing, formatting, and enrichment.
Some telemetry solutions utilize in-band network telemetry (INT), a framework that collects data on the forwarding plane without any involvement of the control plane.
The telemetry solution usually acquires data through one of two basic methods: Push and pull. In the push mode of data collection, the telemetry modules “subscribe” to data sources and receive whatever data they “publish” to them. This is known as the publication/subscription (pub/sub) model. In contrast, the “pull” mode involves the modules querying data and “pulling” the data they need.
2. Transport and Measurement Protocols
Network telemetry solutions handle network data that mostly arrives conforming to various popular network data protocols. SNMP is one of the most common network telemetry protocols. It’s an Internet Standard telemetry network protocol used for collecting and organizing data relating to devices being managed on an IP network, such as switches, routers, and load balancers.
Another is Internet Control Message Protocol (ICMP). Devices such as routers use ICMP to send error messages and other operational data indicating failure or success of communication between devices with IP addresses.
Other protocols used for data collection include Two-Way Active Management Protocol (TWAMP), which is designed to share data about round-trip messages between network devices. NetFlow, developed for Cisco routers, enables the collection of data about IP network traffic as it passes through an interface. With NetFlow, a network admin can track the sources and destinations of network traffic.
Internet Protocol Flow Information Export (IPFIX) services the need for a universal standard for the export of internet protocol (IP) flow data from routers and other devices on the network. Sampled Flow (sFlow) is for exporting truncated packets, which can signal the existence of a network problem.
3. Receiving Network Telemetry
Network telemetry solutions receive data through a variety of mechanisms. A common approach is to use a data bus such as Apache Kafka, which acts as a centralized hub for pub/sub activities. A data bus like Kafka enables admins to manage subscriptions to streams of network data — facilitating real-time process of telemetry data and integration with components of the telemetry solution.
4. Storing, Analyzing, and Presenting Network Data
Network telemetry is an ongoing, ever-changing workload. As network telemetry systems store, analyze, and present network data, they support several key applications. These include network performance visualization, diagnostics, and alerting, along with forecasting and trend reporting.
Stages of Telemetry
Network telemetry can function at varying degrees of sophistication and maturity. Some stakeholders categorize these applications according to four levels:
- Level 1: Static telemetry — Data sources and types are determined at the point of design and remain static.
- Level 2: Dynamic telemetry — Programming or configuration of telemetry data is possible on the fly. Changes can be made without disrupting the network. This approach is more flexible and adaptable to changing business requirements, modifications to the network topology, or application landscape. Some tasks may be automated.
- Level 3: Interactive telemetry — Tasks are almost completely automated and dynamic. It is possible to adjust telemetry data in real time, depending on real time feedback.
- Level 4: Closed-loop telemetry — People are not involved in the telemetry process, except for reporting. Intelligent network operations software is able to modify network configurations automatically based on telemetry.
3 Challenges of Network Telemetry
To provide a comprehensive analysis of network telemetry, it’s important to consider the difficulties that can also arise. Not all organizations are suited for telemetry tools and could leave themselves open to the below challenges.
1. Scalability
A network telemetry solution amasses a significant amount of data—filling repositories that never stop growing. It’s essential to have the ability to scale the data storage and analysis capabilities as data volumes grow.
At the same time, it’s wise to balance scaling requirements with cost. Not every bit of data has to be retained forever. Data retention policies are advisable to avoid the complexity and expense of storing large volumes of data that serve no purpose for telemetry after a given period of time.
2. Measurement and Analysis
Scoping the parameters of measurement and analysis can be a challenge to success with network telemetry monitoring. The process can generate so much data in such varied forms, as well as a wide variety of measurement scenarios, that it can quickly become messy. Keeping things focused, at least in the early stages of deploying network telemetry, is a good practice.
The analysis itself can be challenging, even if it is sufficiently narrow in scope. For instance, to avoid getting confused by signal-to-noise distraction, it is necessary to tune the analytics processes — and continue to retune as time goes on.
On a related front, it’s worth remembering the security analyst’s go-to statement, “Not all anomalies represent threats, and not all threats create anomalies.” Sometimes, a network anomaly is just an anomaly. The better the tools can get at dismissing false positives, the better the whole process will go.
3. Balancing Pro-Active with Reactive Monitoring
When should network admins and security teams be proactive with network telemetry versus reactive? Figuring this out can be a challenge. There is no right answer, but the best approach is to determine what network performance issues are most important to the business and work back from there.
For instance, if seasonal traffic spikes are a problem, then being proactive about network streaming telemetry at those times will make sense.
Analyze your Network with Entuity Software™
For comprehensive measurement into network performance and analysis, look no further than our network telemetry tool Entuity Software™.
Entuity empowers ITOps teams to streamline the monitoring, visualization, and management of their infrastructure with greater efficiency and effectiveness. Our tool supports thousands of devices, from hundreds of vendors, so you never miss an alert. Our event management system also ensures that you receive only the most relevant and actionable information, so you are not distracted by alert noise, which can often occur with complex networks.
Contact us today to learn more about Entuity, and level up your network management performance.
Frequently Asked Questions:
-
Where in a network should network telemetry be deployed and why?
While the long-term goal may be to have completely pervasive collection of network telemetry data, it’s wise to limit deployment early in the implementation of a network telemetry solution. Focus should be on network segments and systems that are critical to the business.
-
What is the difference between network telemetry and network alert logs?
When considering network telemetry vs network alert logs, know that network telemetry refers to the overall collection and analysis of network data, including network alerts. Network alert logs, in contrast, are more limited in scope and only cover devices that are set up to issue alerts based on specific parameters.
-
What's the difference between network monitoring and telemetry?
Network monitoring and network telemetry are similar processes, but monitoring simply tracks network performance and flags problems, whereas telemetry, which involves the collection of many different types of data about the network’s functioning, analyzing the data, often in customized ways, and alerting stakeholders based on parameters that can be dynamic in nature.
-
What is an advantage of network telemetry over SNMP pulls?
The main advantage of network telemetry over SNMP pulls has to do with speed. Telemetry can detect and report on issues in microseconds, while an SNMP pull occurs at preset intervals. The process can miss events that network telemetry detects.