Loading Now

Troubleshooting Azure Image Builder: Fixing Common TCP Port Errors

Azure Image Builder Job Fails With TCP 60000, 5986 or 22 Errors

In this guide, I’ll walk you through how to resolve Azure Image Builder job failures when you encounter error messages similar to the following:

[ERROR] Connection error: unknown error Post “https://10.1.10.9:5986/wsman”;: proxyconnect tcp: dial tcp 10.0.1.4:60000: i/o timeout
[ERROR] WinRM connection err: unknown error Post “https://10.1.10.9:5986/wsman”;: proxyconnect tcp: dial tcp 10.0.1.4:60000: i/o timeout

[ERROR] Connection error: unknown error Post “https://10.1.10.8:5986/wsman”;: context deadline exceeded
[ERROR] WinRM connection err: unknown error Post “https://10.1.10.8:5986/wsman”;: context deadline exceeded

If you are building a Linux image, the errors will relate to SSH instead:

[ERROR] Connection error: unknown error Post “https://10.1.10.8:22/ssh”;: context deadline exceeded
[ERROR] SSH connection err: unknown error Post “https://10.1.10.8:22/ssh”;: context deadline exceeded

Background

My goal was to use Azure Image Builder to create a base image for Azure Virtual Desktop, pre-installed with required legacy software from third-party vendors. In many cases, repackaging these applications (like converting to MSIX) simply isn’t supported, so they must be included in the image build process itself.

To ensure the image building process is secure:

  • The build VM must not have any direct access from the internet.
  • Application installers are hosted on a storage account using a private endpoint for blob access.

In such scenarios, you need to set up a virtual network and adapt your Image Template to specify an existing subnet. Following the Microsoft documentation, you’re instructed to allow inbound TCP traffic on ports 60000-60001 from the Azure Load Balancer to your VNet:

To lock down traffic further, I also placed a custom DenyAll rule at priority 4000. The Azure default deny rule, in my experience, can be too lenient!

Despite following these steps, the build job failed, initially with errors related to TCP port 60000. Puzzling, isn’t it?

Diagnosing the Issue

Over the years, I’ve helped migrate several legacy applications with limited documentation into secure, micro-segmented Azure environments. Here’s my tried and tested troubleshooting procedure:

  1. Set up Log Analytics and a Storage Account to collect diagnostic logs
  2. Enable Network Security Group (NSG) flow logs and Traffic Analytics on the subnet used for building images
  3. Run the build again to reproduce the problem
  4. Examine the NTANetAnalytics table in Log Analytics for blocked flows

This approach quickly uncovered the root of the issue: network traffic from the Azure Container Instance (the image builder) to the proxy VM was being blocked on TCP port 60000. The problem? The source wasn’t the Azure Load Balancer, so the firewall rule didn’t permit this connection.

After running the image build again (yes, testing can be a slow process!), the deployment failed once more. This time, logs indicated another connectivity issue: the proxy VM was unable to reach the build VM on port 5986 (for Windows) or port 22 (for Linux).

Adding a further NSG rule resolved this:

Once these rules were added, my build jobs progressed from running to distributing and finally to Success.

Underlying Cause Explained

The issue stemmed from adding a restrictive DenyAll rule, which deviated from Microsoft’s suggested network rules. While Azure’s built-in AllowVnetInBound rule is rather broad (allowing traffic from any routed network, including hub & spoke topologies), it’s often necessary to introduce micro-segmentation using custom DenyAll rules for greater security.

However, by overriding AllowVnetInbound, critical communication (such as ACI to proxy VM and proxy VM to build VM) is blocked unless you specifically add new NSG rules permitting this traffic. These additional rules are essential to ensure that only the required connections are allowed during the image build process.