Blog

Three things MV3 taught me about the service-worker lifecycle

7 min read

The feature worked. I tested it, it passed, I shipped it. Twenty minutes into a user session, it stopped responding. Not with an error - it just went quiet. The content script sent a message to the background, the background did not reply, and from the outside it looked like the whole feature had been removed. Restarting the browser fixed it. Reloading the extension fixed it. The problem was not reproducible in dev tools because opening dev tools for the background page keeps the service worker alive. I had been testing in a condition that made the bug invisible.

This is the canonical Manifest V3 debugging trap. The background is no longer a persistent page that lives as long as the browser does. It is a service worker that the browser terminates after roughly 30 seconds of idle time. When it wakes back up on the next event, it starts fresh - no in-memory state, no timers, no open connections, nothing from the previous invocation. The code I had written assumed a persistent process model. It was wrong about its environment.

I have shipped MV3 extensions over the past two years and the lifecycle is the part I underestimated most. Three lessons below cost me real debugging time. Each one is a consequence of the same root cause - the service worker is ephemeral - but they surface in different ways.

Lesson 1: Termination is not a bug - it is the contract

The Chrome extension service worker has an idle timeout of approximately 30 seconds. After 30 seconds with no events to handle, the browser terminates the process. This is intentional. It is the whole point of the service worker model: no persistent background processes, no memory leaks that accumulate over days of browser uptime, no extension that silently consumes CPU. The browser makes the lifecycle decision, not the extension.

The failure mode this creates is straightforward: any state you store in module-scope variables is gone on the next wake. A map you built up over ten message events. A timer you set to fire in 60 seconds. A WebSocket connection you opened. An in-flight promise whose resolution you were waiting on. None of these survive termination.

The fix is to treat the worker as stateless by default and rehydrate from chrome.storage on each cold start. Here is the wrong approach and the right approach side by side:

// Wrong: module-scope state that vanishes on termination
const cache = new Map();

chrome.runtime.onMessage.addListener((msg, sender, reply) => {
  if (msg.type === "get") {
    reply({ value: cache.get(msg.key) }); // always undefined after cold start
  }
  if (msg.type === "set") {
    cache.set(msg.key, msg.value);
  }
});
// Right: treat storage as the source of truth
chrome.runtime.onMessage.addListener((msg, sender, reply) => {
  if (msg.type === "get") {
    chrome.storage.session.get(msg.key, (result) => {
      reply({ value: result[msg.key] });
    });
    return true; // keep the message channel open for async reply
  }
  if (msg.type === "set") {
    chrome.storage.session.set({ [msg.key]: msg.value });
  }
});

chrome.storage.session was added specifically for MV3 to address this pattern. It persists across worker restarts within a browser session but is cleared when the browser closes - the right semantics for ephemeral extension state. chrome.storage.local persists across browser restarts. Which one you use depends on whether the state needs to survive a browser restart.

The rehydration pattern means that on every cold start, the first message the worker handles reads from storage rather than from an in-memory cache that does not exist yet. The latency cost is a single storage read - around 1ms on fast hardware, under 10ms on slow hardware. That is not a performance problem. The data being silently absent because you assumed a persistent process is a correctness problem.

Lesson 2: Event listeners must be registered at the top level, synchronously, on every load

The service worker registers event listeners by running the module script. Chrome evaluates the top level of the script synchronously on each wake, before dispatching any events. This is where you get to reconnect to the event system. If your listener is behind an async boundary - inside a promise callback, inside an await, inside a timeout - it may not be registered before the event that woke the worker fires.

The wrong shape looks like this:

// Wrong: listener registered inside an async callback
async function setup() {
  const config = await chrome.storage.local.get("config");
  // By the time this resolves, the event that woke the worker
  // may have already been dispatched with no listener registered.
  chrome.runtime.onMessage.addListener((msg) => {
    handleMessage(msg, config);
  });
}

setup();

The right shape is synchronous listener registration at the module top level:

// Right: listener registered synchronously at top level
let config = null;

chrome.runtime.onMessage.addListener((msg) => {
  // config may be null on first message after cold start
  // handle that case explicitly inside the listener
  handleMessage(msg, config);
});

// Load config asynchronously, separately from listener registration
chrome.storage.local.get("config", (result) => {
  config = result.config ?? defaultConfig;
});

This pattern means the listener is registered before any async work begins. If a message arrives while config is still loading, the listener handles it and deals with the null config explicitly (either with a fallback or by queuing the message). The alternative - missing the event entirely because the listener was not yet registered - is harder to debug because it produces silence rather than an error.

The same constraint applies to chrome.alarms.onAlarm, chrome.tabs.onUpdated, chrome.action.onClicked, and every other event source. Register all listeners synchronously at the top level, always, even if the handler itself needs async data to do its work. The handler can await; the registration cannot.

The canonical documentation for this pattern is in the Chrome MV3 service worker lifecycle guide, which explains the registration window in more detail than any blog post will. Read it once. The constraint is a few paragraphs but it will save you a full debugging session.

Lesson 3: State that used to live in setInterval now needs the alarms API

The setInterval and setTimeout APIs work inside a service worker. They do not work across the service worker's lifetime boundary. If you set a 5-minute interval inside the worker, it fires correctly - until the worker is terminated after 30 idle seconds. The next wake starts a fresh module execution; the interval is gone.

I hit this when I had a background polling loop that was supposed to check an external status endpoint every two minutes. It worked for the first two checks, then stopped. The interval was set at module load, ran twice, and the third firing never happened because the worker was terminated before the two-minute mark.

The fix is chrome.alarms, which is the platform-level timer that survives service worker restarts. The browser persists alarms in its own scheduler and wakes the service worker when they fire.

// Create the alarm once (check it does not already exist to avoid duplication)
chrome.alarms.get("status-poll", (existing) => {
  if (!existing) {
    chrome.alarms.create("status-poll", { periodInMinutes: 2 });
  }
});

// Handle it with a listener registered at top level
chrome.alarms.onAlarm.addListener((alarm) => {
  if (alarm.name === "status-poll") {
    checkStatus();
  }
});

The alarm persists until you clear it with chrome.alarms.clear("status-poll"), even across browser restarts. The handler runs on each wake triggered by the alarm. The worker does its work, goes idle, gets terminated, and the next alarm fires cleanly on schedule.

One gotcha: chrome.alarms has a minimum period of 1 minute in production extensions (the limit is relaxed during development). If you need sub-minute intervals, the architecture is different - you need to keep the worker alive via another mechanism, or redesign the feature to not require sub-minute background activity.

The general rule for anything that used setInterval or setTimeout with a long duration in an MV2 background page: replace it with chrome.alarms. For short-duration timeouts inside a single event handler that completes well within the 30-second window, plain setTimeout is fine - the worker will not be terminated while it is actively handling an event chain.

The one-line rule

After two years of MV3 development and several extensions shipped, the mental model I use is this: the background service worker is a stateless event handler that can die at any instant between events.

Write code for that model. Register listeners synchronously. Read state from storage, not from module scope. Use chrome.alarms for anything that needs to happen on a schedule. Avoid patterns that assume the worker is alive between the event that set up state and the event that needs to read it.

The MV2 background page model was easy to reason about because it matched the mental model of a normal long-running program. The MV3 model matches the mental model of a serverless function - it wakes, it handles one event, it may die. The debugging difficulty is that MV3 code running under dev tools often does not die because the inspector keeps the worker alive. Your tests need to account for that. Close the extension inspector before testing lifecycle behavior. Add artificial idle time. Test against the environment the worker will actually run in, not the environment that is convenient to inspect.

The Chrome service workers documentation is thorough and accurate. The three lessons above are the ones that were not obvious from reading the docs the first time - they became obvious only after debugging sessions that would have been shorter if I had internalized the stateless-handler model from the start.