Virtualization Technology News and Information
VMblog's Expert Interviews: Kentik Talks Network Performance Management and More

Kentik Interview 

According to Gartner, migration to the cloud creates a fundamental shift in network traffic that traditional network performance monitoring tools fail to cover.  So to fill visibility gaps, I&O leaders must consider cloud-centric monitoring technologies.

Those in the digital stream of commerce must understand network performance management (NPM) for today's cloud-centric, distributed applications.  To get a deeper understanding, I spoke with Avi Freedman, co-founder and CEO of Kentik.

VMblog:  To jump right in, can you tell us what's happening in the world of network performance management (NPM) today?

Avi Freedman:  Traditional network performance management tools have been challenged by the rapid migration to the cloud, which has created a fundamental shift in network traffic. A recent Gartner report stated that packet analyses through physical or virtual appliances do not have any place to instrument within most public cloud environments. That's because typical NPM solutions are built for traditional datacenters and branch office architectures. Due to that centralized hosting of applications, they are unable to manage many new challenges in the cloud.

VMblog:  And what types of new challenges does the cloud present for NPM solutions?

Freedman:  The impact of the cloud comes from the move to distributed application architectures, because software components are no longer residing on the same server. The old architecture of the past is winding down for monolithic applications that run in a central datacenter, or those that connect to branch offices through a strictly private WAN.

Today, cloud applications are spread out across various networks and the Internet, so they have to be accessed through API calls. In addition, end users aren't just internal-they may be revenue-producing end-customers of digital lines of business that are reached across the Internet.

With such a distributed architecture and end users, NPM solutions must be able to combine and analyze performance metrics, traffic flows, Internet routing, and geolocation data to get a better handle on application performance.

VMblog:  Can you give some examples of how all this plays out in the real world?

Freedman:  One recent example involves Pandora, which is a customer of Kentik. Pandora is planning to offer lower pricing tiers that will give users more flexibility in how they consume streaming music. To offset cuts to its subscription revenue, Pandora will need to increase its advertising revenue. Both goals need to be supported by data-driven network visibility to deliver a stellar end user experience for consumers, maximum productivity for advertisers and marketers, on the most efficient cost basis.

Pandora is not alone. Lots of digital businesses make their money through streaming media, or by serving ads, delivering gaming experiences, or supporting e-commerce transactions. Cloud-based NPM solutions use host-based agents which can integrate into any hybrid datacenters and cloud environment, either on application servers or on popular load balancing servers like HAProxy or NGINX. These agents are able to monitor network performance factors such as latency, retransmits, out-of-order packets, and packet fragments.

The agents send information to the back-end NPM solution, where data is ingested and combined with billions of other traffic flow records that come from network infrastructure devices such as routers and switches.

By getting a clear view into performance metrics from actual application traffic flows, network managers can make better decisions to identify and troubleshoot network traffic anomalies. The visualization of traffic over actual Internet paths and geographies through a one-stop network intelligence hub allows engineers to ensure the best possible application performance and plan cost-effective network expansion that will continue to meet user experience requirements.

VMblog:  How can network managers apply this one-stop network intelligence hub to solve user experience problems?

Freedman:  When you have full details and can query them in an ad-hoc manner and get answers quickly, you have the kind of experience with data that's no longer clicking into a few canned reports.  Instead, you can traverse, pivot and zoom into data with the same flexibility as when you scroll around and click through Google maps. Fast, ad-hoc analysis of details at scale makes it possible to quickly sort through performance metrics, volumetric flows, Internet paths, and more. In this way, network managers can determine whether a problem is due to the network, to localize the source such as a particular transit ISP hop on the outbound Internet path, or internal congestion due to a misconfiguration. The faster that operators and engineers can find the root cause of the problem, the faster they can implement a solution. The point here is that by using rich data sets, cloud-based NPM solutions give users enormous flexibility to recursively ask challenging questions, get answers back in seconds, so they can take action to restore the end-user experience.

VMblog:  Can you explain the difference between network performance management (NPM) solutions vs. application performance management (APM) solutions, or are they just two sides of the same thing?

Freedman:  NPM and APM come at the performance problem from different but complementary monitoring perspectives to optimize end user experience. APM is focused on profiling transactions between applications based on metrics such as average response under peak load. In essence, the APM side monitors the health and activity of application components and services.

The NPM side is more focused on the effects of the network and Internet on the end-user experience. NPM monitors network latency, out-of-order packets, and TCP retransmits. Latency - or the amount of time it takes to get a response to a packet - is measured bi-directionally. One direction looks at when a local host such as an app server sends a packet to a remote host. The other direction looks at when a packet is received from a remote host and how long it takes to get a response.

Gauging out-of-order packets is a key measure because TCP is prevented from passing data up to an application until all the bytes are organized in the right order. Similarly, if a network path is overloaded, some packets may get dropped. TCP ensures delivery of data through the use of ACKs to signal that the data has been received. If a sender fails to get a timely ACK from the receiver, it will retransmit a packet.  When TCP retransmits grow above a very low, single-digit percentage points then trouble follows.

VMblog:  How should developers approach this issue of network performance management?

Freedman:  Developers should think about what are the right metrics to understand network performance. I would suggest that developers start by instrumenting their code to get a good baseline. Then they should start experimenting, such as trading CPU for bandwidth with compression. Try using persistent connections to avoid initialization overhead. Parallelize the tx/rx operation to take advantage of high throughput links. Trade non-deterministic behavior for better performance, such as using UDP if its drawbacks can be tolerated. To talk locally, try using a Unix domain socket. For communications across a WAN or the Internet, use a tool such as Kentik Detect and the Kentik nProbe Host Agent to track performance and determine the best routing paths to help avoid problematic intermediary network hops.

VMblog:  To wrap up, what do you see coming next for the future of NPM?

Freedman:  I think NPM presents one of the most exciting opportunities to advance cloud-based digital operations in the years ahead. But first, NPM and network monitoring tools will need to catch up with cloud-scale computing and big data technologies. Most current NPM tools were built on appliance-based architectures from the 1990s. This creates limited compute and storage capacity, which causes traditional NPM tools to take a reductive approach to data. In other words, they dispose of many details and just retain the summaries.

This is absolutely the wrong approach to just give up and assume there are too many details about network data to retain and analyze. We need to take the exact opposite approach, to compile and analyze as much network data as possible in order to quickly identify the root causes of network problems and bottlenecks. The economies of scale derived from big data and cloud computing give network operators a far richer data set to be analyzed, across what were previously considered inviolable data siloes.

One last point is that cloud application performance has become increasingly business-critical. This means many more line-of-business owners are getting actively involved in the sponsorship and funding of new cloud infrastructure initiatives. It's clear that network performance management is already much more than a purely technical concern - it is now a crucial element for business success.  In a sense, in digital business, the network is the business.


Once again, a special thank you to Avi Freedman, co-founder and CEO of Kentik, for taking time out to speak with and answer a few questions about Network Performance Management.

Published Wednesday, September 21, 2016 7:02 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<September 2016>