When you hear someone say that the connection to Microsoft 365 is "slow" today, what is the first thing that comes to mind?  The grammatical nature of such a statement leads you down a certain path.  Even though I specifically said the connection was slow, you cant help the feeling that Microsoft is somehow at fault here. Non-technical end users are far less likely to say their "connection" is slow.  Most times they are going to lead with a less focused (and generally less accurate) statement such as "Microsoft is slow today".

Knowing if Microsoft's network is "slow" today is a tall order.  And yet, it you are involved with supporting Microsoft 365 users in any fashion, you are expected to deliver on this tall order.  First off, "slow" is highly subjective when it comes to humans.  We all have different expectations on how quickly things should be happening.  Even if we assume the network is performing poorly by a valid statistical measure, how can you pinpoint the source of the issue?  Can you accurately determine if it is Microsoft's issue? 

While I was discussing this exact topic with a potential customer last week, the prospect suggested that Microsoft's own press releases show that Microsoft Network feeding Microsoft 365 is "clearly" under stress.  Intrigued, I asked for clarification. I was told that Microsoft 365 has experienced significant growth in the last 12 months. Growth of that level would stress anyone's capacity. It is an interesting argument.  But it is only minimally rooted in fact, and uses a lot of assumption to fill in the remaining gaps.

Is Microsoft's Network "Slow"?

I did not want to respond to assumptions with more assumptions, so I ran a query on our anonymized crowd dataset to display who is contributing the most to network "slowness". When the results came up on the call I said "oh, wow!"  When you aggregate a large number of varied data points at the same time, the visual markers in the data can be blunted.  Typically we might see a subtle trend on a diverse aggregated data set. That was not the case here.  It was immediately obvious who was responsible for most of the network time.  See for yourself:

Network-Isp-Microsoft0

Even on a mobile phone, you can clearly see that the sunflower yellow colored band is consistently responsible for the lion's share of the network time.  Green is the internal network, Sunflower Yellow is the ISP, and Blue is Microsoft.  There is no assuming who owns responsibility for most of the network latency if there is a recurring performance issue. 

Is Microsoft's network "holding up" to the additional load of new daily users of Microsoft 365? Or is it the source of "slow" complaints?  The data shows Microsoft is clearly performing well, even with the additional daily active users. Honestly, it's far more than "holding up".  At times it is faster than internal networks, and that's impressive.

For reference: The data in the graph is a time series aggregation of several million results over the last 60 days, averaged into 5 minute buckets.  I did not filter on any ISP or apply any other criteria. I know the data contains results from at least 37 ISPs throughout North America, South America, Europe, South Africa and Asia.  

Areas of Responsibility

How did we get this data, you ask?  That's classified.  I could tell you, but then I'd have to....  Joking aside, it would take far more time than we have and it would give away some of our best technical assets.  Our solution separates the performance data into what we loosely call areas of responsibility.  (You should read about Perfraction®.)  Every test in our software is designed to further illuminate and help isolate one or more areas of responsibility.  For a topic such as network performance when using Microsoft 365 Apps and Services, we would separate performance data into the internal network, internet provider (ISP),  and Microsoft's network.   We do this for every user, from their device (even at home), to the current location of their data.

A Different Angle

I never like to assume everything is perfect in our software.  And that means I never look at data just one way.  In this case I pulled a smaller data set that is "close to home" to double check the larger aggregated data. 

Instead of taking anonymized data across all sources, the data in the graph below is from a segment of our lab network.  It consists of roughly 75 VMs and physical systems connected to a 940 Mbps fiber internet connection. I know the networks, VLANs and firewall well because I built them all.  We do run some tests against the systems, so you will notice abnormally high spikes on the internal sections. But in general it is a very performant network.

Network-Isp-Microsoft-AOR0

This is not aggregated data, so I quickly graphed it in an alternate solution to display in this article.  In this graph the data is stacked not by the sequence in the journey, but according to lowest average for the time window. Over the last 7 days, the Microsoft network (in Red) actually had a lower average than an internal network (yellow) that I know and I built.  And the ISP?  Even on a 940Mbps fiber connection - one where we average 700-800 Mb/s download and 600-700 Mb/s upload speeds - The ISP is still the single largest AOR (area of responsibility) impacting network performance.  I am honestly a bit surprised.  

In the next post I will get into more detail when we dig into the data a bit deeper.  There is just too much good data for one single post!!  But I'll give you one teaser!  The data in the graph directly above this comes from an ISP that hands off traffic to Microsoft's Network just 103 miles away on average.  It is remarkable to look closely at the impact distance from a user to Microsoft ingress has on performance.  Keep an eye out for my next technical post.

Need to Solve User Performance Issues?

As companies know, overall support for users on any SaaS platform is still the company's responsibility.  If the company is able to identify the 3rd party vendor that is at fault, they are happy to solve problems.  But most times support teams struggle to identify who is responsible for problems and, as a direct impact, struggle to know how they can resolve performance issues.  You can not do this without the right tools to monitor user experience on Microsoft 365 - or any other SaaS platform for that matter. 

Make sure the vendor you choose can accurately silo the impact by who has responsibility.  And networking is just the tip of the iceberg.  So much more than network is involved in user experience.  If you look at a vendor's solution and the technology or 3rd party responsible doesn't jump out at you in a "Sunflower Yellow" way....you're probably not looking at the right vendor.

Article By: Gary Steere