Loading Now

How Azure Route Server Became the Unexpected Hero

In today’s article, I’ll walk you through a recent real-life situation where Azure Route Server’s branch-to-branch routing quickly resolved a critical client issue.

The Initial Network Architecture

Our client is a global enterprise, operating from many locations worldwide. Their existing WAN was managed outside the scope of our work, but it centred around Meraki SD-WAN connecting all their branches. I’m a specialist in Azure networking rather than traditional on-premises networks, but I’ve found Meraki to be straightforward and reliable.

The client began to adopt another cloud provider (not Microsoft). Following the provider’s advice, they established a leased line between this new cloud region, their HQ, and their central data centre. This direct connection ensured low-latency access between cloud-based and on-premises applications and data.

Introducing Azure

The client then decided to use Azure for a range of computing and data workloads. Our team was brought in to establish the Azure environment and guide their transition.

As project lead, I delegated most of the technical setup but focused on the design. After evaluating the options, we found ExpressRoute was the best way to link Azure with their other cloud provider. We selected an Azure region in close proximity to the rival cloud’s data centre for optimal performance.

We rolled out ExpressRoute from a virtual network-based hub within Azure. This is my preferred method because it’s scalable, easier to govern, and offers more troubleshooting flexibility compared to Azure Virtual WAN. We deployed the Meraki solution from the Azure Marketplace in the hub, connecting Azure to their SD-WAN, and enabled BGP propagation using Azure Route Server. Setup was refreshingly uncomplicated.

At this point, their network spanned:

  • The alternative cloud via leased connection
  • Azure using SD-WAN
  • A link between Azure and the other cloud with ExpressRoute

Disaster Strikes!

Months after our main involvement, I got an urgent call. A colleague reported that a digger had damaged the physical line connecting the main data centre to the non-Microsoft cloud, taking a vital link offline.

This outage caused major disruptions for critical services:

  • Application integration between on-premises and the other cloud was lost
  • App access for remote users across global offices was interrupted

After a quick review, I remembered Azure Route Server allows branch-to-branch BGP routing between VPN and ExpressRoute connections.

By thinking of the non-Microsoft cloud’s region as just another “on-premises” site joined via ExpressRoute, the Microsoft documentation became applicable. Using Azure Route Server’s BGP capabilities, we could bridge traffic between:

  • The “on-premises” site through ExpressRoute
  • The SD-WAN network, already peered with Azure Route Server via Meraki

I shared the proposal with the client and, once approved, we put it into action straight away. It required only a single configuration change inside the Azure Route Server!

The Fix in Action

To restore connectivity quickly, we directed all network routes for the alternative cloud through Azure. Their offices and data centre already reached Azure via SD-WAN over Meraki, with BGP routes seamlessly exchanged through peering with Azure Route Server in the hub.

Likewise, ExpressRoute kept BGP routing up between Azure and the other cloud.

The loss of the leased line broke BGP route sharing between SD-WAN and the second cloud, with no quick fix for the physical damage.

We aimed to let routes from both SD-WAN and the other cloud travel across Azure. By doing so, remote sites and the alternative cloud could communicate via the Meraki virtual appliance and ExpressRoute, with Azure acting as a temporary bridge.

Implementation couldn’t have been easier—just enable branch-to-branch routing on Azure Route Server, wait briefly, then validate the advertised BGP routes for your peers (in this scenario, the Meraki appliance).

The fix took effect almost immediately. The new routes appeared, and Azure Monitor showed a noticeable increase in ExpressRoute traffic at the same moment—a sure sign it was working.

Outcome for the Business

Afterwards I heard the client’s feedback: not only was the workaround successful, but application performance had improved.

Originally, users worldwide relied on the following path to access the alternative cloud:

  1. Branch office connects via SD-WAN to the head office
  2. Head office sends traffic via the (now broken) leased line to the other cloud, potentially across continents

With Azure stepping in, a much shorter path was established:

  1. Branch office connects through SD-WAN to Azure
  2. Azure relays data over ExpressRoute—a rapid connection to the other cloud

This dramatically reduced latency and noticeably enhanced the user experience. Conversely, the route from the on-site data centre to the alternative cloud became slightly longer, but at least connectivity was restored. Even weeks later, the leased line was still being repaired—completely out of our client’s hands.

Things to Bear in Mind

Ideally, redundancy would come from having two physical leased lines, but budget and feasibility didn’t allow for it. Azure ExpressRoute Metro could be an alternative in future, although it remains in preview and was unavailable for this region.

Our workaround effectively created a “triangular” network. Once the leased line is operational again, I’ll advise keeping this triangle as a failover—if one route fails, either of the remaining paths can take its place. This principle echoes the automatic resilience on which networks like the original ARPANET were built.

It’s also worth considering adding the non-Microsoft cloud as a site within Meraki SD-WAN long-term to further boost end-user performance.

If we maintain branch-to-branch routing, the quality of routing decisions becomes crucial. When the leased line is working, direct paths should be preferred for reduced latency. However, without careful control, Azure could sometimes be used even when not optimal.

Azure Route Server offers a few ways to influence route selection:

  • (Default) Prefer ExpressRoute: Routes via ExpressRoute take precedence, which could sometimes result in traffic detouring unnecessarily if on-premises prefixes come from the cloud
  • Prefer VPN: VPN-learned routes have higher preference, which might cause cloud-traffic to return back on-premises
  • Use AS path manipulation: The admin sets the preferred route—making sure the direct leased line is used unless it becomes unavailable

Post Comment