Blog

Typed content with Zod: schema as a build-time contract

11 min read

The worst version of a self-hosted blog is a folder of Markdown files with no schema enforcing the shape of the frontmatter. You write a post. You forget the description field. You misspell pubDate as publishDate. You paste a draft date like 2026/13/02 that is not a real date. None of these errors announce themselves during authoring. They surface in production - a broken RSS item, a blank meta description in a Google result, a page that crashes because the date formatter received undefined. I spent two years shipping exactly this kind of blog before I understood that the right fix was not more careful authoring discipline. It was a contract enforced at build time.

Astro's content collections solve this with a Zod schema that runs during astro build. The schema is the contract. A malformed frontmatter field fails the build loudly, with a path to the offending file, before a single HTML file is written. This post is about how that contract works, what the fields actually mean, and where the contract ends - the honest limits Zod cannot protect you from.

The schema is the contract

Here is the full schema for this site, from src/content.config.ts:

import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";

const posts = defineCollection({
  loader: glob({
    pattern: "**/*.{md,mdx}",
    base: "./src/content/posts",
    // Derive the entry id from the filename, not the `slug` frontmatter.
    // Bilingual EN/RU counterparts intentionally share one `slug` (that is how
    // findPair links them), so slug-as-id would collide and drop one locale.
    generateId: ({ entry }) => entry.replace(/\.[^.]+$/, ""),
  }),
  schema: z.object({
    title: z.string(),
    description: z.string(),
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    tags: z.array(z.string()).default([]),
    draft: z.boolean().default(false),
    lang: z.enum(["en", "ru"]).default("en"),
    slug: z.string(),
  }),
});

export const collections = { posts };

Eight fields. Four of them are worth looking at closely because each one does something that a plain type annotation cannot.

z.coerce.date() on pubDate takes the YAML string "2026-05-12" and coerces it into a JavaScript Date object at build time. Without coercion, frontmatter date strings are raw strings and you have to call new Date(post.data.pubDate) everywhere. With coercion, post.data.pubDate is already a Date. Every component that formats a date - the post header, the blog index list, the RSS feed, the sitemap - receives a typed Date, not a string that might or might not parse. The wrong type (pubDate: 2026/13/02) fails the build before it can reach any formatter.

z.coerce.date().optional() on updatedDate makes the contract explicit about absence. If the field is missing, post.data.updatedDate is undefined. If it is present, it is a Date. TypeScript knows which case you are in and will not let you call .toISOString() on an undefined without a null check. The schema makes the optionality type-safe rather than leaving it as a "I should probably check if this exists" comment in the component.

z.array(z.string()).default([]) on tags handles the case where an author omits the field. The default means a post with no tags: line in frontmatter gets an empty array rather than undefined. Every downstream consumer - the tag index pages, the tag chips on the post header, the RSS item categories - can call .map() on post.data.tags without a guard. The schema absorbs the optional-field handling once, globally, rather than at every callsite.

z.enum(["en", "ru"]).default("en") on lang is the smallest field and the one with the highest semantic load on this site. The bilingual routing model depends on every post declaring its locale. If lang were a free string field, a typo like lang: enn would produce a post that matched neither the English filter nor the Russian filter - it would not appear on either blog index. The enum eliminates that category of mistake. The .default("en") means the five original posts, written before bilingual routing existed, do not require a frontmatter change to work in the English-only view.

slug: z.string() is required with no default. This is intentional. The slug drives the URL - /blog/<slug>/ in English and /ru/blog/<slug>/ in Russian - and it is the key that pairs the two language versions of the same post. A post without a slug does not get a URL; the build fails. A post with a wrong slug would generate the wrong URL or break the locale switcher. Making it required and visible in the frontmatter means the author has to make a deliberate choice, which is the right behavior for something that ends up in every canonical, hreflang, RSS link, and internal cross-reference.

Failure modes the schema catches at build, not in production

The contract's most useful property is that violations fail loudly and early. Here are the four failure modes I would have shipped before I had the schema:

Wrong type in pubDate. If you write pubDate: 2026-13-02 (month 13), z.coerce.date() produces Invalid Date and Zod throws a validation error during the build. The error message includes the file path and field name. The build exits non-zero. Cloudflare Pages does not deploy. The broken post never ships. Without the schema, new Date("2026-13-02") silently returns Invalid Date and every formatter that calls .toLocaleDateString() on it renders "Invalid Date" on the live site.

Missing description. The schema has description: z.string() with no default. Omitting the field fails the build. This is the right behavior because description is used in the <meta name="description"> tag, the OG description, the post list excerpt, and the RSS item summary. None of those have a useful fallback. An empty string is worse than a build failure because it silently produces a site with blank meta descriptions in search engine previews.

Unknown lang value. A typo like lang: en-US fails the enum check. Without the enum, a post with lang: en-US would simply not match the data.lang === "en" filter on the blog index, and the post would silently disappear from both locales. The build failure is strictly better - it surfaces the problem before the deploy.

A tag array that is accidentally a string. If you write tags: astro, typescript instead of tags: [astro, typescript], YAML parses it as a string, not an array. Zod's z.array(z.string()) rejects the string type with a clear message. Without schema validation, post.data.tags.map(...) would throw a runtime error - either during SSG if the tag index pages try to render it, or silently producing no tag chips if the component does a loose falsy check.

In each case, the schema converts a silent production failure into a loud build failure. That shift matters for a static site where you do not have a staging environment that matches production exactly. Your build is your test.

Typed reads downstream

The schema propagates to every read. getCollection("blog") returns CollectionEntry<"blog">[] where the data field is the TypeScript type inferred from the Zod schema. Add a new required field to the schema, and every component that reads posts immediately has it available as a typed property. TypeScript will flag any component that does not handle the new field correctly.

This site's RSS feed at src/pages/rss.xml.js calls getCollection("blog") and maps over the results. It accesses post.data.pubDate, post.data.title, post.data.description, post.data.slug - all of these are typed. If the schema changes (say, pubDate becomes optional), TypeScript surfaces every callsite that assumes it is always defined.

The tag index pages - /blog/tags/[tag]/ - collect all posts and group them by post.data.tags. The z.array(z.string()).default([]) guarantee means the grouping code never has to handle undefined. The collectTags helper in src/lib/posts.ts is seven lines with no null checks because the schema made null checks unnecessary.

The locale filter data.lang === "en" on the blog index works only because the schema guarantees lang is always one of the two valid string literals. If it were an unconstrained z.string(), the filter would still run but TypeScript would not know that it was comparing against a union type, and a typo in a frontmatter file would produce a silently empty list rather than a build error.

Where the contract ends

Zod validates shape, not semantics. The schema cannot catch every class of content error, and it is worth being honest about where the guarantee breaks down.

Duplicate slugs across posts in the same locale. If two English posts share the same slug value, the Zod schema validates both as individually correct. The getStaticPaths route generator will emit both posts at the same URL path. In Astro's static build, the second one silently overwrites the first. The schema does not check uniqueness across the collection. Slug uniqueness is a convention enforced by the author, not a guarantee enforced by the build.

Slug format conventions. The schema requires slug: z.string() but does not constrain the format. A slug with spaces, uppercase letters, or special characters will produce a URL that either encodes oddly or breaks the locale switcher's string-equality pairing. z.string().regex(/^[a-z0-9-]+$/) would enforce the convention, but this site does not add that constraint. The convention is documented and trusted.

MDX body quality. Zod runs on frontmatter only. A post with an empty body, a broken MDX component reference, or a heading structure that does not match the table of contents - none of these fail the Zod check. The MDX parser catches broken component syntax, but semantic quality checks (is the body long enough, does the post have a clear thesis) are left to the author.

Cross-locale pair integrity. The bilingual routing model relies on the English and Russian versions of a post sharing the same slug value. The schema validates each post independently. If the RU version has a different slug than the EN version, both posts validate, both build successfully, and the locale switcher silently links to a post that does not exist in the other language. This is the category of semantic correctness the schema explicitly cannot provide.

Adding a field: what propagates automatically

One of the things I underestimated before writing this schema is how much work a new field does without any extra wiring. When I added the lang and slug fields as part of the bilingual routing work, here is what changed automatically - with no additional code:

The blog index filter data.lang === "en" started working because TypeScript now knew lang was a literal union type, not a free string. The RU blog index used the same filter with "ru". The static path generators for /blog/[slug] and /ru/blog/[slug] switched from post.id to post.data.slug as the URL parameter - one field change, one line change per route, zero breakage because the type system surfaced every consumer.

The hreflang and canonical link tags in the layout started receiving post.data.slug instead of the file-derived ID. The RSS feed updated to use post.data.slug for the item guid and link. The locale switcher in the post header used slug as the key to find the counterpart post. Every one of these was a small change - replace post.id with post.data.slug - that TypeScript guided me to because the field was now typed.

The contrast with a hand-rolled pipeline is that in a hand-rolled setup, adding a new content field means auditing every file that reads posts and manually threading the new value through. With a typed schema, the audit is a TypeScript check. You add the field, run tsc --noEmit, and the type errors tell you exactly which files consume posts and what needs to change.

The glob loader and file identity

One detail worth noting: by default Astro's glob loader would set the entry id to the file path relative to the base directory - something like typed-content-with-zod-build-time-contract.mdx (with the extension). This site overrides that via generateId: ({ entry }) => entry.replace(/\.[^.]+$/, ""), which strips the extension. So the actual id for this post is typed-content-with-zod-build-time-contract (no .mdx). That override is also what makes the bilingual model work: the EN and RU files for the same post share a slug value in frontmatter, and without generateId the default slug-derived id would collide and drop one locale. The route generator uses post.data.slug, not post.id. The filename and the slug are usually the same string (I match them deliberately), but they are separate values. The schema owns the slug; the loader owns the ID.

This matters if you rename a file. Renaming old-filename.mdx to new-filename.mdx does not change the slug in the frontmatter. The URL stays the same. The file ID changes, but nothing in the routing logic uses the file ID directly after the initial getCollection call. The separation between filename and content-addressable slug is a small ergonomic win - you can reorganize the content directory without breaking URLs.

The payoff: writing into a contract

The practical difference between having the schema and not having it is authoring cadence. Without a schema, adding a new post to a site has a background anxiety: did I get all the fields right, will the RSS break, will the meta description be blank. With the schema, the question is settled by astro build. If the build passes, the frontmatter is correct. You write into the contract instead of hoping you remembered the contract.

For a site where the blog is part of the trust signal - the site is the resume, as PRODUCT.md puts it - a broken meta description or a missing post on the RSS feed is not a minor annoyance. It is a defect in the signal. The schema makes that class of defect build-time visible rather than production-invisible.

The broader lesson is that a typed content layer and a typed application layer have the same property: early, loud failures are better than late, silent ones. The schema is not extra infrastructure. It is the smallest possible enforcement of the invariants the site depends on, and it fits in sixteen lines. The six months of posts I have shipped since adding it have had zero frontmatter-related production bugs. That is the return on sixteen lines.

If you want to see how this integrates with the bilingual routing model, the post on why I rewrote this portfolio from Nuxt to Astro covers the content-collections decision in the broader rewrite context. The official Astro content collections documentation and the Zod documentation are the canonical references for the API surface - read both if you are setting up a new collection; the Astro docs are unusually clear about the glob loader and the schema coercion behavior.