Loading Now

Migrating from MSEE Hairpin Routing to AVNM Mesh for Large-Scale VNet-to-VNet Connectivity

In many large Azure setups, a familiar trend is directing VNet-to-VNet traffic through Microsoft Enterprise Edge (MSEE) routers. This occurs when spoke VNets under a hub-and-spoke structure need to communicate, forcing the traffic to zigzag through the ExpressRoute circuit. Essentially, the data leaves the Azure data centre, makes its way through MSEE, and then re-enters the data centre to reach the target VNet.

While this arrangement functions, it wasn’t intended to be a permanent solution for east-west traffic. Thanks to the Azure Virtual Network Manager (AVNM) mesh connectivity and new scaling capabilities — accommodating as many as 5,000 VNets and High-Scale Private Endpoints (HSPE) supporting up to 20,000 Private Endpoints across connected VNets — businesses can now transition to a more direct, in-datacentre routing model that eliminates dependency on MSEE for VNet-to-VNet traffic.

In this article, we’ll explore why making this shift is beneficial, the new scaling limits, how to enable essential features, and ways to carry out the migration smoothly.

This migration is particularly pertinent if your setup features 50 or more spoke VNets communicating east-west through ExpressRoute hairpin routing. It’s also beneficial if you’re nearing the VNet peering limits, aiming to decrease ExpressRoute use for internal traffic, or seeking a more straightforward, centrally managed connectivity model. Even if much of the east-west traffic still needs to pass through a hub firewall for inspection, you can simplify the connectivity management for direct flows.

Currently, when spoke VNets communicate via MSEE, the data takes a less efficient route:

Spoke A → Hub VNet → ExpressRoute Gateway → MSEE → ExpressRoute Gateway → Hub VNet → Spoke B

  • Single point of failure. Relying on MSEE creates a shared risk for east-west traffic. If MSEE encounters an outage or underperformance, it impacts all VNet pairs in the topology.
  • Lower-latency path. Direct mesh connectivity allows traffic to remain in the data centre for spoke-to-spoke communications, which usually results in lower latency compared to going through MSEE.
  • Bandwidth limits. MSEE circuits have set bandwidth limits. When east-west traffic uses them, it competes for space with north-south on-premises data and might overwhelm the circuit.
  • Operational risks at scale. Large configurations place heavy demands on MSEE infrastructure, raising concerns over scalability, reliability, and operational efficiency in environments with thousands of VNets.

The most obvious alternative — establishing direct peering connections between every spoke pair — eliminates the MSEE dependency but brings its own challenges:

  • Combinatorial growth. Connecting N spokes requires N × (N – 1) / 2 peering relationships. So, with 100 spokes, you’d need 4,950 peerings; and for 1,000 spokes, nearly half a million.
  • Management headache. Each connection has to be individually set up, observed, and maintained, leading to increased complexity and operational overhead.

AVNM mesh aids in providing group-based connectivity. You define a set of VNets as a network group, apply mesh connectivity settings, and AVNM will automatically establish bi-directional links among all members.

Traffic between connected VNets stays within Azure’s data centre: no MSEE involvement, no hub detour, and no tedious manual peering management.

  • Group connection. One mesh configuration links every VNet in the group to all others.
  • Central oversight. You can easily add or remove VNets from the group; AVNM automatically adjusts the connectivity.
  • Direct routes. Data flows directly between VNets without involving the hub or MSEE.
  • Dynamic membership. Use Azure Policy to automatically include new VNets based on tags or resource group specifications.

High-scale mesh allows for a larger topology mobilization — up to 5,000 VNets and 20,000 Private Endpoints.

Dimension

Standard Mesh

High-Scale Mesh

VNets per mesh

Up to 250 (soft limit, requests for increase possible)

Up to 5,000

Private Endpoints per mesh

Up to 2,000

Up to 20,000 (with HSPE enabled)

Private Endpoints per VNet

Up to 1,000

Up to 5,000 (with HSPE enabled)

As the mesh scales, the quantity of Private Endpoints across linked VNets increases. In large environments, the default limits — 1,000 Private Endpoints per VNet and 2,000 across the connected VNets — can quickly become a concern. Activating HSPE raises these limits to 5,000 and 20,000, respectively.

To ensure a smooth transition in large-scale mesh migrations, it’s wise to enable HSPE ahead of time if you anticipate reaching the standard limits for Private Endpoints. Follow these steps:

  1. Check that Private Endpoint Network Policies are set to Enabled or RouteTableEnabled on all relevant subnets. This is essential.
  2. Change the VNet-level property PrivateEndpointVNetPolicies to Basic to activate HSPE.

Within the AVNM mesh connectivity settings, make sure to enable high-scale private endpoints. AVNM will confirm that all VNets in the mesh are HSPE-enabled. Any missing configurations will lead to a clear error, blocking the deployment.

Once set up, deploy the connectivity configuration. AVNM manages HSPE across the mesh.

  • Transitional connection reset. When activating or deactivating HSPE, existing Private Endpoint connections in the VNet will experience a brief disconnection (about 1 second). It’s best to schedule this during a maintenance window.
  • Changes in PE traffic monitoring. HSPE will treat each Private Endpoint IP like any other IP in the VNet, eliminating individual traffic counters. If you rely on per-PE metrics, consider alternatives prior to enabling.
  • Billing updates for on-premises PE traffic. On-premises PE traffic will show up as a collective charge on the gateway VNet rather than on the individual Private Endpoint, although the overall cost remains unchanged.
  • Existing peerings remain intact. AVNM won’t delete manual peerings unless directed to do so.
  • Automatic traffic adjustment. Once mesh connectivity is deployed, east-west traffic routes directly. The MSEE method remains available if you decide to remove the mesh.
  • No need to reconfigure hub components. Firewalls, gateways, and NVAs within the hub will keep functioning as normal. North-south on-premises traffic will still flow through the hub gateway.
  • Easy rollback while the legacy path is still available. Keep the MSEE routing in place during the validation period, so affected VNets can revert back if necessary.

A common question is whether direct mesh connectivity bypasses hub firewalls or network appliances. The answer hinges on current inspection practices.

  • Mesh focuses on connectivity, not routing policy. If the subnets of spokes have User-Defined Routes (UDRs) directing traffic through a hub NVA or firewall, those UDRs will still apply and can keep inspected traffic on that path.
  • Security Admin Rules ensure central segmentation. For flows that don’t require firewall inspections, AVNM Security Admin Rules can enforce network-wide allow or deny policies across network groups.
  • Use both approaches appropriately. Mesh can facilitate direct connections for approved flows, while Security Admin Rules manage the required segmentation where necessary.

Recommendation: Prior to migration, take stock of which spoke-to-spoke connections currently pass through the firewall. Make a decision for each flow whether to maintain inspections with UDRs or enable direct mesh paths by removing UDRs for those traffic pairs.

The shift from MSEE hairpin routing to AVNM mesh is designed to be non-disruptive. Mesh connectivity overlays onto existing peering configurations and prioritizes routing for east-west traffic. There’s no need to dismantle your current hub-and-spoke structure first.

  1. Outline your mesh structure. Organise VNets by region into mesh groups. If you’re expecting over 250 VNets per mesh, you’ll need to register for the AllowHighScaleConnectedGroup feature in advance.
  2. Create a Network Manager and form your Network Groups. Ensure the AVNM scope covers all relevant subscriptions. Use static membership for initial migration or dynamic membership via Azure Policy for continued enrollment.
  3. Activate HSPE on all VNets within the mesh. Follow the HSPE enablement steps if you anticipate needing more than 2,000 Private Endpoints in a mesh. Schedule this during a maintenance window to prepare for a brief connection reset.
  4. Create the mesh connectivity settings in AVNM. Select your network groups, enable mesh topology, activate high-scale private endpoints, and enable global mesh if cross-region connections are necessary.
  5. Deploy incrementally. Start with a test region or a non-critical environment. Confirm effective routes, connectivity between spokes, and between spokes and hubs, along with Private Endpoint availability and expected VNet flow logs prior to rolling out to production regions.

Throughout the migration, mesh and MSEE can work alongside each other. Mesh-connected VNets will get direct routes for their destinations, while existing ExpressRoute routes will still cater to on-premises destinations. UDRs will still take precedence over system routes, ensuring that forced tunnelling and inspection patterns are retained when UDRs are present.

  • Mesh destinations. Traffic between meshed VNets travels directly rather than through MSEE when no UDR is overriding the route.
  • On-premises destinations. ExpressRoute continues to provide the necessary uplink for on-premises networks.
  • Gateway transit. Spokes can still connect to on-premises through the hub gateway if the design includes that feature.

If you’re managing VNet peerings via Terraform, Bicep, or ARM templates, treat AVNM mesh as the new point of reference only after validating its performance.

  1. Deploy mesh first. AVNM mesh can be established alongside existing peerings, so don’t remove peering resources from infrastructure-as-code until the mesh path is validated.
  2. Confirm the traffic path. Utilize effective routes, Connection Monitor, and flow logs to ensure traffic uses the mesh as expected.
  3. Prevent drift. Check pipeline states and lifecycle settings before stripping out old peerings, especially in environments managed by multiple teams.
  4. Document AVNM. Use infrastructure-as-code to manage the Network Manager, groups, configurations, and deployments so that the mesh becomes the governed connectivity approach.

Bear in mind that mesh connectivity won’t alter DNS resolution behaviour on its own. If spoke VNets are already established with Private DNS Zones in the hub, these links will continue to direct name resolution. If spokes use custom DNS servers in the hub, ensure that any changes to UDRs during migration don’t inadvertently disrupt DNS traffic paths.

Here’s how an enterprise can transition two regional hub-and-spoke structures into one centrally managed AVNM mesh, while keeping the old MSEE routing functioning during validation.

Contoso Corp runs a large Azure environment in the East US region with two hub-and-spoke models:

 

Topology A

Topology B

Hub VNet

Hub-A

Hub-B

Spoke VNets

500

500

ExpressRoute Gateway

ER-GW-A in Hub-A

ER-GW-B in Hub-B

ExpressRoute Circuit

Shared circuit, connecting both gateways

Same shared circuit

Avg. Private Endpoints per spoke

~8 (4,000 total)

~12 (6,000 total)

Total Private Endpoints

10,000 across both topologies

Current traffic flow:

  • Spoke-to-spoke within Topology A: Spoke-A-01 → Hub-A → ER-GW-A → MSEE → ER-GW-A → Hub-A → Spoke-A-02
  • Spoke-to-spoke across topologies: Spoke-A-01 → Hub-A → ER-GW-A → MSEE → ER-GW-B → Hub-B → Spoke-B-01

Every spoke-to-spoke packet — whether within the same topology or across both — must exit the data centre, pass through MSEE, and re-enter. With 1,000 spokes generating east-west traffic, MSEE becomes a shared vulnerability that adds latency to each flow.

Contoso aims to consolidate all 1,000 spoke VNets into a single AVNM mesh, freeing east-west traffic from MSEE’s influence.

  • Spoke-to-spoke communication, any pair: Spoke-A-01 → directly → Spoke-B-01. Traffic remains within the data centre and uses the direct mesh route.
  • MSEE’s function: MSEE is now solely responsible for handling north-south traffic. East-west load is eliminated from the ExpressRoute hairpin method.

 

 

  • Feature registration. Since the mesh will contain 1,000 VNets, exceeding the 250 standard limit, Contoso registers the AllowHighScaleConnectedGroup feature for the subscription. This unlocks high-scale mesh support for up to 5,000 VNets.
  • Private Endpoints analysis. With 10,000 Private Endpoints shared across 1,000 VNets, Contoso surpasses the standard mesh limit of 2,000. HSPE needs to be activated.
  • Batch 1: Activate HSPE on all 500 spoke VNets in Topology A during a maintenance window by following the instructions.
  • Batch 2: Apply identical settings to all 500 spokes in Topology B.
  • Projected impact: Each VNet may experience a momentary connection reset for existing connections when HSPE is activated. Schedule this change during maintenance time.
  1. Establish a Network Manager covering the management group with all 1,000 spoke VNets.
  2. Define a single network group named eastus-mesh-all-spokes, utilising dynamic membership with an Azure Policy and the relevant tags.
  3. Develop a mesh connectivity configuration. Set the topology to Mesh, select the network group as eastus-mesh-all-spokes, high-scale private endpoints to Enabled, and set global mesh to Not necessary since all VNets are in one region.
  4. Save the configuration as a draft, but do not deploy yet.

Wave 1 — Pilot: Contoso deploys the mesh configuration to 50 dev/test VNets, either by utilizing a temporary network group or tagging just those VNets initially. This validation includes reviewing effective routes to confirm ConnectedGroup as the next-hop type for meshed spoke prefixes, testing spoke-to-spoke connectivity via the direct mesh, checking Private Endpoint access among meshed spokes, ensuring that on-premises connectivity via the hub and MSEE remains unaffected, and assessing VNet flow logs to validate expected direct flows.

Wave 2 — Light production traffic VNets: Following a successful pilot, Contoso tags the VNets handling lighter production workloads. The dynamic network group will then automatically include them. Contoso will reapply the connectivity configuration, execute the same validation checklist, and monitor traffic patterns to confirm that east-west data is now bypassing the ExpressRoute method.

Wave 3 — All remaining production VNets: Once all remaining spokes are tagged, Contoso redeploys the configuration. At this point, the entire 1,000 spokes reside in the mesh.

No downtime during migration: Even during Waves 2 and 3, the current MSEE hairpin routing will still function. VNets not yet part of the mesh will continue their communication through MSEE, while those already connected to the mesh will directly relay through it. This ensures there’s no planned disconnection during the migration.

After deployment, check that the mesh is functioning correctly and that traffic is following the anticipated route before removing the old route.

  1. Effective routes check. Ensure spoke subnets exhibit direct pathways to peer VNet prefixes instead of routing through the gateway or MSEE.
  2. Connection Monitor. Track significant spoke-to-spoke traffic and compare latency and connection quality before and after the migration.
  3. VNet flow logs. Double-check that east-west traffic reflects the expected mesh route and isn’t still travelling through the ExpressRoute gateway.
  4. Network Watcher topology. Visualize the final connectivity model and identify any VNets not included in the target network group.

If traffic continues to follow the MSEE path after the mesh is deployed, investigate UDRs overriding system routes, any missing spokes from the network group, incomplete deployment within the designated region, or infrastructure code recreating legacy peerings.

  • Quick rollback while keeping MSEE active. If necessary, remove affected VNets from the network group or undeploy the mesh configuration. AVNM will only remove the connectivity it established, allowing traffic to revert to the existing MSEE route.
  • Rollback post-decommissioning legacy paths. If earlier peerings or route dependencies have already been removed, the rollback process might involve reprovisioning those resources and requires a longer modification window.
  • Recommendation: Keep the MSEE routing option available for at least two weeks post-mesh deployment, monitor traffic activities, and only then eliminate the old path.

Metric

Before (MSEE Hairpin)

After (AVNM HSPE Mesh)

Spoke-to-spoke latency within the same region

Higher due to MSEE hairpin routing

Lower-latency direct path within the data centre; actual latency depends on workload, region, and network conditions

Traffic path for east-west

Spoke → Hub → MSEE → Hub → Spoke

Spoke → directly to Spoke through the Mesh

MSEE dependency for east-west

Yes, shared dependency

No MSEE dependency for east-west traffic

Manual peerings required

0 with hairpin routing, but 499,500 if built manually for 1,000 spokes

0 manual spoke-to-spoke peerings; AVNM handles connectivity

Private Endpoints supported

2,000 per mesh based on standard limits

20,000 per mesh with HSPE enabled

Rollback complexity

Not applicable to current hairpin model

Remove VNets from the network group or undeploy the connectivity configuration

Migration downtime

Not applicable

Designed for no planned downtime when deployed gradually and carefully validated

Shifting to AVNM mesh does not require you to tear down your existing networking setup. Hub gateways, firewalls, and NVAs will continue working as before. The key difference is that east-west traffic between spokes will no longer need to leave the data centre unnecessarily.

  • MSEE isn’t suited for the east-west fabric. By shifting internal traffic off the ExpressRoute circuit, we improve both reliability and capacity.
  • AVNM mesh replaces combinatorial complexity with group-based configurations. This operational model scales based on group numbers rather than VNet counts.
  • High-scale mesh and HSPE eliminate ceilings — facilitating connections for as many as 5,000 VNets and 20,000 Private Endpoints per mesh.
  • The migration is gradual and reversible. Mesh can coexist with existing routes, allowing for validation phase by phase before removing the legacy route.

Start with a pilot mesh in a non-critical space, validate the traffic shifts, and then expand from there.

 

Share this content:


Discover more from Qureshi

Subscribe to get the latest posts sent to your email.

Discover more from Qureshi

Subscribe now to keep reading and get access to the full archive.

Continue reading