Skip to the content.

mdtoc – Specification (v1)

1. Purpose and core principles

mdtoc is a deterministic CLI tool for processing individual Markdown documents.

Functions:

Core principles:

Note: In this document, “formal” only means “clear enough for parsers, tests, and later code generation”. It does not mean a large architecture, but a small, robust contract framework.

2. Scope and non-goals

mdtoc intentionally processes only a small, unambiguous Markdown subset.

Supported in v1:

Not supported in v1:

Note: The restriction to a small Markdown subset is intentional. It keeps the parser, test cases, and debugging simple.

3. Explicit document structure

A document managed by mdtoc uses exactly this container structure:

<!-- mdtoc -->
[TOC CONTENT]
<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->
<!-- /mdtoc -->

Rules:

Note: The user can determine where the table of contents should appear by moving the ToC area.

Explanation: The complete container is the managed area. toc=off does not mean “no container”, but “an empty managed ToC area”.

Note: The explicit container structure is intentionally easier to read than implicit marker logic. It makes the area managed by mdtoc immediately visible.

4. Parsing rules

4.1 Principle

The specification describes managed behavior in a line- and position-oriented way.
An implementation MAY internally use a Markdown parser as long as the external behavior matches this specification exactly.

Explanation: For implementation in Go, an internal parser such as goldmark is useful, even though the managed rewrite rules remain described in a line-oriented way. Current implementation note: The current implementation uses a self-contained line parser plus a small inline-text extractor; an alternative parser is still allowed as long as the external behavior remains identical.

4.2 Ignored regions

These regions are ignored when detecting markers and headings:

  1. Fenced code blocks with backticks:
    • Start: a backtick fence according to the supported Markdown parser or supported v1 subset (a line beginning with three backticks)
    • End: the corresponding closing backtick fence (the next line beginning with three backticks)
  2. Fenced code blocks with tilde:
    • Start: a tilde fence according to the supported Markdown parser or supported v1 subset (a line beginning with three tildes (~~~))
    • End: the corresponding closing tilde fence (the next line beginning with three tildes (~~~))
  3. Inline code spans:
    • region between two backticks on the same line
  4. HTML comments:
    • <!-- ... -->
    • exception: <!-- mdtoc -->, <!-- /mdtoc -->, <!-- mdtoc off -->, and <!-- mdtoc on -->

Not ignored:

  1. Blockquotes

Blockquotes are normal input lines.
They are not treated as a special region.

Practical consequence:

Interpretation:

4.3 Parsing order

Processing logically runs in this order:

  1. Determine ignored regions or Markdown context.
  2. Recognize the outer mdtoc container and config block only outside ignored regions.
  3. Recognize headings only outside ignored regions.
  4. Semantically normalize managed artifacts.
  5. Derive the target state.
  6. Render the output.

Explanation:

5. Heading syntax

5.1 Candidates for headings

Only lines that begin directly at the start of the line with one of the following prefixes are headings for mdtoc at all:

hashes := "# " | "## " | "### " | "#### " | "##### " | "###### "

This also means:

Note: The space is intentionally part of hashes here. This simplifies the parser: after the prefix, either the number, the anchor, or the title begins immediately.

5.2 Structure of a managed heading

Managed headings use exactly this schema:

heading_line := hashes [number SP] [anchor] title
number       := DIGIT+ ("." DIGIT+)* "."
anchor       := "<a id=\"anchor_id\"></a>"
title        := NONEMPTY_TEXT
SP           := exactly one U+0020 space

Additional rules:

Explanation:

Examples of valid managed headings:

# Title
## 1. Introduction
## <a id="introduction"></a>Introduction
### 2.1. <a id="api-overview"></a>API Overview

Examples that mdtoc does not treat as a managed structure:

 # Title
##  1. Introduction
### 1.2 Introduction
### <a id="x"></a> Introduction

5.3 Meaning of the syntax

Note: The pattern ### 2.1. API is therefore intentionally reserved for mdtoc. Anyone writing a free heading in exactly this format is using the same syntax as the tool.

5.4 Supported Markdown subset

mdtoc is not a general Markdown parser.

For headings, v1 supports:

The practical prefilter is therefore at least:

^#{1,6} 

And the actual rewrite logic applies only to lines that also satisfy the remaining positional logic.

6. Small formal model

This section describes the minimal internal view that is helpful for clean implementation and tests.

6.1 Managed heading

Internally, this model is sufficient for a managed heading:

ManagedHeading
- line_index
- level
- title_markup  // Title area as it appears in the document, but without managed numbering and without the managed inline anchor
- title_text    // Plain-text interpretation of title_markup; source for ToC link text and anchor ID
- number        // derived or empty
- anchor_id     // derived or empty

Semantically important are only:

Derived from these are:

Explanation:

6.2 Document state

For mdtoc, a document is practically in one of these states:

mdtoc does not persist a state field. A stripped document is still a managed document, but it does not match the generated target and check therefore returns a mismatch until regen is run.

6.3 Processing pipeline

Processing always follows the same simple pattern:

parse -> normalize -> derive -> render

This means:

Note: This is not meant to force a large AST architecture. It only defines which pieces of information are semantically relevant and which are render artifacts only.

6.4 Validity range of min and max

This version uses the following simple, easy-to-understand rule:

Practical consequence:

Cross-reference:

7. Config block

The config block is an optional HTML comment placed directly before <!-- /mdtoc -->. It stores whitespace-separated key=value fields:

<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->

The same config may be written across multiple lines:

<!--
numbering=true min=2 max=4
slug=github anchor=true link=true toc=true bullets=auto
-->

Rules:

There is no state field and no container-version field. Legacy <!-- mdtoc-config ... --> blocks are not part of this specification.

8. CLI interface

8.1 Commands

Option Description
mdtoc --version Prints short version information.
mdtoc --version --verbose Prints detailed version information.
   
mdtoc --help Prints short help text.
mdtoc --help --verbose Prints long help text.
   
mdtoc [--file <name>\|<name>] [GENERATE OPTIONS] root mode: uses regen for valid managed input without generate overrides, otherwise generate.
mdtoc [GENERATE OPTIONS] < INPUT.md root mode on stdin; same dispatch rule as above.
   
mdtoc generate [--verbose] [OPTIONS] generates/updates ToC, numbers, anchors.
mdtoc generate --help Prints long help text specifically for generate.
   
mdtoc regen [--verbose] regenerates from the persisted container config.
mdtoc refresh [--verbose] alias for regen.
mdtoc regen --help Prints long help text specifically for regen.
mdtoc refresh --help Prints the same help text as regen.
   
mdtoc strip [--verbose] [--raw] removes ToC, numbers, anchors and optionally config.
mdtoc strip --help Prints long help text specifically for strip.
   
mdtoc check [--verbose] checks whether the document matches regenerated output.
mdtoc check --help Prints long help text specifically for check.

8.2 Options for generate

Option Default Meaning
--numbering <on\|off\|true\|false> on enable or disable heading numbering
--min-level <N> 2 minimum managed heading level (>=1)
--max-level <N> 4 maximum managed heading level (<=6)
--slug <github\|gitlab\|crossnote> github select the slug algorithm for inline anchors and ToC link targets
--anchor <on\|off\|true\|false> on enable or disable managed inline anchors
--link <on\|off\|true\|false> on render ToC entries as Markdown links when enabled
--toc <on\|off\|true\|false> on renders the managed ToC area when on, leaves it empty when off
--bullets <auto\|*\|-\|+> auto choose the generated unordered-list bullet style
--file <name> read and overwrite file
--verbose off diagnostic and progress messages on stderr
--help show help

Input form rules:

Short forms:

Option Short form
--numbering -n
--anchor -a
--bullets -b
--file -f
--verbose -v
--help -h

Compatibility note:

8.3 I/O and logging behavior

Root-mode dispatch rules:

Generate-control flags:

9. Commands

9.1 generate

Behavior:

  1. Parse the document.
  2. If no managed container is present, create the complete container at the beginning of the file.
  3. If marker structure or config is invalid: error and no change.
  4. Semantically remove existing managed artifacts:
    • ToC content
    • managed heading numbers
    • managed inline anchors
  5. Determine relevant headings.
  6. Recalculate numbers if numbering=true.
  7. Recalculate anchor_id for all relevant headings using slug.
  8. Render managed inline anchors only if anchor=true.
  9. Re-render the ToC if toc=true; otherwise render the managed ToC area empty.
  10. Re-render headings.
  11. Re-render config according to section 7.
  12. Write the document back.

Additional rules:

Example of a rendered heading:

### 4.1. <a id="open-source"></a>Open source

Explanation:

9.2 strip

Behavior:

After strip, this structure is still valid:

<!-- mdtoc -->
<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->
<!-- /mdtoc -->

Error case:

9.3 strip --raw

Behavior:

Conservative rule:

Use cases:

9.4 regen

Behavior:

Error case:

9.5 check

Behavior:

No side effects:

Interpretation:

Note: “byte-for-byte” sounds more formal than it is in practice. What it means is: check computes the same text that generate or strip would write, and compares exactly that.

10. ToC rules

The ToC is based on all managed headings within min to max, inclusive.

Render rules:

Example:

* [1. Introduction](#introduction)
  * [1.1. API](#api)

Bullet selection:

Displayed in the link text:

Link target when link=true:

Behavior of anchor:

Behavior of slug:

Explanation:

Cross-reference:

11. Slug and anchor ID specification

slug selects the algorithm used for managed inline anchor IDs and generated ToC link targets.

Inline anchor IDs are deterministically derived from the unnumbered title. ToC targets with anchor=false are derived from the rendered heading source, including a managed number prefix when numbering is enabled.

11.1 Goal

The generated values should be:

11.2 Input for the derivation

For every managed heading, the following applies:

slug_source := title_text     // github, gitlab
slug_source := title_markup   // crossnote
anchor_id   := slugify(slug_source)

The following also applies:

Explanation:

11.3 GitHub-compatible basic rules

The function slugify MUST perform at least these steps:

  1. Input is title_text.
  2. Letters are converted to lowercase using Unicode lowercasing.
  3. Markdown formatting characters and inline markup do not contribute literal characters to the slug; only their visible text content counts.
  4. Unicode letters and Unicode decimal digits are preserved.
  5. Runs of whitespace and punctuation between preserved text parts are normalized to exactly one -.
  6. Leading and trailing runs of whitespace or punctuation do not create a leading or trailing -.
  7. If the resulting slug already exists in the same document, -1, -2, -3, … is appended.

Interpretation:

11.4 Explicit decisions for edge cases

Additionally, the following applies in mdtoc v1:

Explanation:

11.5 Relationship to inline anchor syntax

If anchor=true, mdtoc renders exactly this form:

<a id="anchor_id"></a>

The following applies:

Cross-reference:

11.6 Examples

Example 1

### Open source

open-source

Example 2

### This'll be a _Helpful_ Section About the Greek Letter Θ!

thisll-be-a-helpful-section-about-the-greek-letter-θ

Example 3

### Übergrößenträger & naïve façade – déjà vu!

übergrößenträger-naïve-façade-déjà-vu

Example 4

### 中文 русский عربى

中文-русский-عربى

Example 5

### 🚀 !!!

section

Example 6

Two identical headings ### API result in:

api
api-1

11.7. GitLab slug profile

If slug=gitlab, mdtoc MUST derive IDs according to the GitLab heading-ID rules documented for GLFM.

The GitLab profile applies these steps:

  1. Input is title_text.
  2. All text is converted to lowercase.
  3. All non-word text is removed.
  4. Spaces are converted to -.
  5. Two or more adjacent hyphens are collapsed to one.
  6. If the resulting ID already exists in the same document, -1, -2, -3, … is appended.

For mdtoc, the GitLab profile is interpreted as follows:

The GitLab profile therefore differs from the GitHub-compatible profile in important edge cases:

Examples:

## Version 3.5
## A+B
## foo_bar baz

In GitLab mode, these headings yield:

version-35
ab
foo_bar-baz

11.8. Crossnote / Markdown Preview Enhanced slug profile

If slug=crossnote, mdtoc derives IDs with the Crossnote / Markdown Preview Enhanced style used by the github-slugger plus uslug pipeline.

For mdtoc, this profile is interpreted as follows:

Important examples:

## 1.1. API
## An ATX title with closing hash markers  ####

In Crossnote mode, these headings yield:

11-api
an-atx-title-with-closing-hash-markers--

12. Error behavior, logging, and exit codes

Error cases:

Basic rules:

Recommended exit codes:

13. Idempotence

Idempotence is part of the contract.

Examples:

mdtoc generate
mdtoc generate

=> no further change on the second run

mdtoc strip
mdtoc strip

=> no further change on the second run

mdtoc strip --raw
mdtoc strip --raw

=> no further change on the second run

Cross-reference:

14. Extensibility

Possible later extensions:

Note: These points are explicitly extensions. They should not make v1 unnecessarily complex.

15. Current implementation basis (informative)

The current Go implementation is intentionally self-contained.

Current basis:

Current implementation notes:

Alternative implementations:

Explanation: