Multi-Tenant Architecture: Real Challenges and an Azure Design Walkthrough

Let’s dive into a commonly used reference design for Azure-based systems.

A typical setup may look like this:

Microsoft Entra External ID (Azure AD B2C) for handling authentication
Azure API Management as the entry layer
App Service or Functions for your computing needs
Cosmos DB or SQL for data storage
Redis for efficient caching
Service Bus for asynchronous processing
Application Insights for monitoring performance

If you’ve worked with Azure before, none of this should come as a surprise.
On paper, this architecture is tidy, scalable, and designed for multiple tenants.

However, as soon as traffic starts to increase and tenant behaviours vary, issues can arise unexpectedly.

Here’s what I frequently observe:

The tenant ID is present in the API but missing in asynchronous processes
Background jobs process data without knowing which tenant it belongs to
Logs become ineffective as you struggle to link actions back to a tenant

The solution seems straightforward but is often overlooked during implementation:

Every message should carry the tenant context, without exception.

If you think “it’ll be available somewhere,” chances are it won’t be, especially in distributed systems.

Explicitly include tenant context everywhere:

public class TenantMessage
{
    public string TenantId { get; set; }
    public string Payload { get; set; }
}

Every message, event, and asynchronous task must contain the tenant scope.

Many teams start with a shared database model that features tenant-based partitioning.
This approach works well initially.

However, over time, problems can start to emerge:

- A tenant filter is forgotten in a query
- A query unexpectedly scans across multiple partitions
- A large tenant begins to hinder performance for others

A simple query like this becomes essential:

var query = container.GetItemQueryIterator(
    new QueryDefinition("SELECT * FROM c WHERE c.tenantId = @tenantId")
        .WithParameter("@tenantId", tenantId)
);

The challenge lies not just in writing it once, but in ensuring it’s applied everywhere, every time.

At the start, access control seems easy:

“Users can access data tied to their own tenant.”

But as requirements expand:

Admin access becomes necessary
Cross-tenant visibility is requested
Reporting across various firms or regions is needed

This is when things often become complicated.

Different services may begin to implement their own rules, leading to inconsistent behaviours over time.

A simple check like this:

public bool CanAccess(string userTenant, string resourceTenant, bool isGlobalAdmin)
{
    if (isGlobalAdmin) return true;
    return userTenant == resourceTenant;
}

…becomes much harder to manage when it’s duplicated across several services.

One effective strategy is to centralise your authorization logic from the start.

Caching often gets integrated later to boost performance.

This is when risks can arise.

I’ve noticed situations where:

Cached data from one tenant is served to another
This occurs because the cache key didn’t include tenant information

Addressing this is simple:

public string BuildCacheKey(string tenantId, string key)
{
    return $"{tenantId}:{key}";
}

Always ensure cache keys include tenant identifiers.

All tenants share various resources:

Computational power
Database capacity
Messaging services

In practice, this leads to:

One heavily loaded tenant affecting the performance of others
Unpredictable latency
Behaviour divergence across tenants

You might start implementing controls like this:

if (RequestsPerTenant[tenantId] > 100)
{
    return StatusCode(429);
}

Gradually, you may develop:

Throttling mechanisms
Workload isolation strategies
Resource prioritisation

This challenge is less about design and more about operational realities.

Logging generally functions well until you scale.

Then, you might find:

Logs from all tenants become jumbled
Debugging slows down significantly
Answering basic questions like “which tenant encountered issues?” becomes difficult

A minor adjustment can make a significant difference:

_logger.LogInformation(
    "Tenant={TenantId} Action=ProcessOrder OrderId={OrderId}",
    tenantId,
    orderId
);

While this approach seems obvious, it’s often inconsistent across services.

Taking backups is straightforward.

However, restoring a single tenant can be challenging.

In many shared database setups:

Restores occur at the database level
This impacts all tenants

If one tenant experiences a problem, recovery isn’t simple.

This highlights how decisions made early on can have lasting impacts.

Designing a multi-tenant system isn’t solely about selecting Azure services.

The real challenges can be boiled down to:

How tenant context is managed
How isolation is enforced
How systems operate under uneven loads

Most issues won’t surface immediately.
They typically emerge as tenants grow and exhibit different behaviours.

If you’re interested in exploring these concepts further, here are some useful official resources:

Share this content:

Discover more from Qureshi

Subscribe to get the latest posts sent to your email.

Multi-Tenant Architecture: Real Challenges and an Azure Design Walkthrough

Like this:

Related

Discover more from Qureshi

Share this:

Like this:

Related

Discover more from Qureshi

Important: FY26 PRACR close deadlines

From Zero to Cloud Hero: Setting Up Azure Blob Storage Made Easy

Related Posts

Discover more from Qureshi