ENGINEERING · April 14, 2026 · 13 min read
How MigrationFox cut SharePoint migration time by ~50%
Migration performance is not an abstract benchmark. It is the difference between your cutover finishing on Sunday at 2am versus Monday at 9am with users waiting. This month we shipped an 18-fix performance pass across the SharePoint site migration path, the governance scanner, and the worker HTTP stack. On the tenants where we have measured end-to-end, typical SharePoint migrations are landing in about half the wall-clock time they used to take, and governance scans are 60–70% faster.
This post is the engineering-honest version of what changed. We will walk through the four categories of improvement, show where the wins actually came from, and say out loud which optimisations were theatre and which ones moved the needle.
What “throughput” actually means in migration
Before the fixes, a question: what is a migration tool’s throughput a measure of? The intuitive answer is megabytes per second — byte streaming speed. For file-heavy migrations (SMB-to-Blob, big document libraries) that is basically correct. For SharePoint site migrations it is only half the story.
A typical SP site migration includes:
- Enumerating site columns, content types, lists, views, pages — pure read-side metadata work
- Creating schema on the destination — dozens of small writes per list
- Writing list items — thousands of small POSTs, each with a full metadata payload
- Uploading files — a mix of large PUTs and small PUTs
- Replaying permissions — another burst of small writes
Most of the wall-clock time is spent on small requests, not on file bytes. A 5 GB document library can finish the byte transfer in five minutes but still spend twenty-five minutes waiting on 12,000 sequential Graph calls to tag each file with its content type and custom columns. Optimising throughput means optimising the small-request path — the HTTP round trips, the serialisation, the ordering — not just the bytes-per-second number that looks impressive on a marketing slide.
ShareGate and BitTitan both have “Insane Mode” / “High Speed” marketing around file transfer. That is real and it applies to file bytes. It is not the same thing as site-migration throughput, where the bottleneck is the Graph call count, not the file bytes.
Fix 1: Graph $batch for the metadata path
The Microsoft Graph /$batch endpoint accepts up to 20 sub-requests in a single HTTP round trip. For metadata-heavy phases — like reading the content types on a site, or applying column metadata to a list of items — that one feature alone is a 20x reduction in network round trips.
Before the audit, our content-type, site-column, and list-view enumeration phases were each issuing one GET per list per attribute type, sequentially. A site with 40 lists was costing us 40 × 3 round trips just to read schema. Post-fix, those enumerations are batched in groups of 20, which on a site with 40 lists drops 120 round trips to 6. Each round trip on Graph is typically 120–400ms, so the wall-clock difference on a mid-size site is roughly 30 to 45 seconds of dead air the user does not have to sit through.
We did not use $batch for item writes. Item creation in Graph has enough quirks (per-item field-stripping for invalid columns, per-item retry on 429, progressive fallback when a field fails) that the sub-request semantics of $batch (partial success, shared throttling bucket) are more hindrance than help. For writes, we went a different direction.
Fix 2: p-limit parallelism with back-pressure
The old item-write path was either fully sequential (safe, slow, used for anything sensitive) or fully parallel with Promise.all (fast, noisy, prone to 429 storms). Neither is what you want for a production migration. What you want is bounded concurrency with back-pressure.
We moved every parallel phase onto p-limit with per-phase concurrency caps. The caps we ended up with after measurement:
- Site enumeration: 6 lists walked in parallel, 10 pages in parallel per list
- List-item writes: 8 items in parallel per list, 4 lists in parallel per site
- File uploads: 4 blocks in parallel per file, 6 files in parallel per library
- Permissions replay: 4 in parallel (permission writes are throttle-sensitive)
On top of the cap, the worker watches the Retry-After header from every 429. If throttling kicks in, the affected phase pulls its effective concurrency down and waits for the retry window before resuming. If the tenant is healthy, the cap stays at the configured value. No manual tuning.
The subtle win here is not the top-end speed — it is the consistency. Fully-parallel implementations hit 429 storms that take 30–90 seconds to drain; bounded-parallel implementations barely see 429s at all. The p-limit version finishes faster on average because it does not blow itself up.
Fix 3: Undici keep-alive and HTTP/2
Every HTTP library has a default connection pool. Node’s built-in https agent creates a new TCP + TLS connection per request unless you explicitly enable keep-alive, and even with keep-alive its pool defaults are modest. For a worker making thousands of requests to graph.microsoft.com in quick succession, that TLS handshake overhead is real.
We moved the Graph client onto undici with:
- Persistent connection pools scoped per destination host (one pool for Graph, one for each SharePoint REST endpoint, one for each Azure Blob account)
- Keep-alive on by default with a 60-second idle timeout
- HTTP/2 multiplexing where the origin supports it — Graph does, so dozens of concurrent requests share a single connection
- Pipelining cap of 10 per connection to avoid head-of-line blocking
The measured win on a 5,000-item list was about 25% of the phase wall-clock time. The first couple of requests still pay TCP+TLS, but everything after that reuses the established connection. Over a 40-minute site migration, that is meaningful.
Fix 4: Governance scan dedup
This one lives in the Copilot Readiness scanner, not the migration worker, but it matters for the same reason: too many redundant Graph calls.
The governance scan runs six modules (Purview, Identity, SharePoint, Teams, OneDrive, Power Platform) and each one used to hit /users, /sites, or /groups independently. The SharePoint module needed the sites list. The Teams module needed the sites list. The Identity module and OneDrive module each needed the users list.
We added a request cache scoped to a single scan run, keyed on the normalised Graph URL. First module to ask for /users?$select=id,userPrincipalName pays the API cost; every subsequent module gets the cached response. No Graph calls were removed from scan logic; they just stopped happening two or three times.
On a 1,200-user tenant, full six-module scan time dropped from about 5m30s to about 1m50s. The cache is discarded at the end of each scan so the next run still reflects live tenant state.
Fix 5 through 18: the long tail
Not every fix is worth a section heading. A few worth mentioning because they are generally useful patterns:
- Progressive field stripping on SP item writes. If a POST fails with
invalidRequeston a read-only or computed field, we retry once without that field instead of failing the whole item. Ugly but the only reliable way to handle Graph’s inconsistent treatment of system columns. - Avoid re-enumeration on retry. The old retry path would re-enumerate a list from scratch if any item in it failed. We now cache the enumeration result per job and retry only the failed items.
- Streaming
drive/itemstraversal. Folder walks used to load the full tree into memory before starting downloads. Now they stream, so a million-item library does not OOM the worker. - Batch permission reads per list. Previously one read per item for permissions; now one
$batchper 20 items. - Defer index-column creation. Indexed columns block list creation on large schemas. We create the list with all columns first, then index the ones that need it asynchronously.
- Tighten the default-view payload. The
fieldsselector used to ask for 40+ properties we never looked at. Now we request only what the migration phase needs. - HEAD-after-PUT verification on the Azure Blob path. Shaved a per-block round trip by doing it only at commit time, not per block.
- Skip unchanged items in delta runs. The delta comparator now uses the Graph
cTagwhere available (fast) and falls back tomodifiedDateTimeonly whencTagis missing.
None of these by themselves is dramatic. Together they are the difference between “pretty fast” and “done by the time you get back from lunch”.
What we measured
On the three tenants where we have reliable before-and-after numbers:
| Workload | Before | After | Delta |
|---|---|---|---|
| Mid-size SP site (40 lists, 12k items, 8 GB) | 1h 48m | 52m | ~52% faster |
| Small SP site (6 lists, 800 items, 400 MB) | 9m 20s | 4m 10s | ~55% faster |
| Governance full scan (1,200 users) | 5m 30s | 1m 50s | ~66% faster |
| Azure Blob ingest (SMB, 1 TB) | 4h 40m | 3h 15m | ~30% faster |
The SMB-to-Blob number is smaller because that path was already bottlenecked on the bytes-per-second of the network link, not on per-request overhead. Speed fixes help less when the wall clock is already set by hardware.
We do not claim “10x faster than ShareGate” or similar marketing numbers. We have not run head-to-head benchmarks, and on file-bytes-only workloads ShareGate’s Insane Mode is very probably competitive with or faster than what we do today, because Insane Mode uses the SPMT stream format that bypasses Graph’s per-item write cost. We are saying: MigrationFox got substantially faster relative to itself, and on Graph-based site migration it is now in the range where the difference to Insane Mode on a site migration (as opposed to a file-only one) is small.
What did not help
Worth calling out because we tried them and they were not worth it:
- Global retry fan-out. We tried letting each failing request retry aggressively across multiple connections. Result: worse 429 storms. Reverted.
- Larger upload blocks. Going from 8 MB to 32 MB per block did not measurably help; on a single-connection path it just increased memory pressure. Stayed at 8 MB.
- Compressing request bodies. Graph does not meaningfully benefit. JSON compresses poorly on small payloads and Graph does not accept
Content-Encoding: gzipon most write endpoints. - Switching HTTP clients to axios / got / fetch. None were meaningfully faster than undici once pool tuning was applied, and each had its own quirks. Stayed with undici.
What is next
The roadmap item that could move the needle further on site migrations is switching the largest-list write path to the SharePoint REST _api/web/lists/... endpoint where it supports batched item creation in a single POST. Graph does not expose this shape of batching for list-item writes. It is more work to support two write paths and we have to maintain parity on field stripping, error shapes, and permission replay. We will ship it when the engineering cost is justified — currently that is somewhere on the margin of another 15–25% improvement on write-heavy phases.
Related reading
- SharePoint Site Migration platform page
- Migrating SharePoint Site Pages: why canvas layout matters
- Cross-tenant SharePoint permissions without user-mapping hell
- What’s new in Copilot Readiness (April 2026) — the governance-side deduplication work
Get started
The speed fixes are live on every workspace. Start a free SharePoint migration at app.migrationfox.com/register and watch your first pre-flight report come back in seconds rather than minutes.