# mdtoc – Specification (v1)

## 1. Purpose and core principles

`mdtoc` is a deterministic CLI tool for processing individual Markdown documents.

Functions:

* generation of a table of contents (ToC)
* consistent heading numbering
* generation of stable anchor IDs and ToC link targets according to a selected slug profile
* removal of all artifacts generated by `mdtoc`
* generated-output validation of a document for CI

Core principles:

* The visible heading text is the only semantic source of truth.
* Heading numbers are derived and not persistent.
* Inline anchor IDs are computed from the unnumbered title according to `slug`.
* ToC link targets use the same `slug` profile, independently of whether inline anchors are enabled.
* Generated content is fully reconstructible.
* `mdtoc` changes a document only on the basis of a clearly defined managed structure.
* The tool is idempotent.

_Note:_ In this document, "formal" only means "clear enough for parsers, tests, and later code generation". It does not mean a large architecture, but a small, robust contract framework.

## 2. Scope and non-goals

`mdtoc` intentionally processes only a small, unambiguous Markdown subset.

Supported in v1:

* single Markdown file
* ATX headings from `#` to `######`
* defined ToC markers
* defined config block
* defined inline anchor form in headings

Not supported in v1:

* Setext headings
* GUI automation
* PDF generation
* multi-file processing
* a complete Markdown AST as a specification subject of `mdtoc` itself
* partial processing such as `--toc-only` or `--anchors-only`

_Note:_ The restriction to a small Markdown subset is intentional. It keeps the parser, test cases, and debugging simple.

## 3. Explicit document structure

A document managed by `mdtoc` uses exactly this container structure:

```md
<!-- mdtoc -->
[TOC CONTENT]
<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->
<!-- /mdtoc -->
```

Rules:

* The outer container consists of start marker, ToC area, optional config block, and end marker.
* If present, the config block must appear immediately before `<!-- /mdtoc -->`.
* `<!-- mdtoc -->` may occur at most once.
* `<!-- /mdtoc -->` may occur at most once.
* The config block may occur at most once.
* If the config block is absent, all default config values apply.
* If none of the outer markers is present, `generate` inserts the complete container at the beginning of the file.
* If only one of the outer markers is present, or if the start marker appears after the end marker, this is a parsing error.
* Everything between `<!-- mdtoc -->` and the beginning of the config block is the managed ToC area.
* If there is no config block, everything between `<!-- mdtoc -->` and `<!-- /mdtoc -->` is the managed ToC area.
* Foreign content in the ToC area is not deleted by `generate`, but preserved as an HTML comment.

_Note:_ The user can determine where the table of contents should appear by moving the ToC area.

_Explanation:_ The complete container is the managed area. `toc=off` does not mean "no container", but "an empty managed ToC area".

_Note:_ The explicit container structure is intentionally easier to read than implicit marker logic. It makes the area managed by `mdtoc` immediately visible.

## 4. Parsing rules

### 4.1 Principle

The specification describes managed behavior in a line- and position-oriented way.  
An implementation MAY internally use a Markdown parser as long as the external behavior matches this specification exactly.

_Explanation:_ For implementation in Go, an internal parser such as `goldmark` is useful, even though the managed rewrite rules remain described in a line-oriented way.
_Current implementation note:_ The current implementation uses a self-contained line parser plus a small inline-text extractor; an alternative parser is still allowed as long as the external behavior remains identical.

### 4.2 Ignored regions

These regions are ignored when detecting markers and headings:

1. Fenced code blocks with backticks:
   * Start: a backtick fence according to the supported Markdown parser or supported v1 subset (a line beginning with three backticks)
   * End: the corresponding closing backtick fence (the next line beginning with three backticks)
2. Fenced code blocks with tilde:
   * Start: a tilde fence according to the supported Markdown parser or supported v1 subset (a line beginning with three tildes (`~~~`))
   * End: the corresponding closing tilde fence (the next line beginning with three tildes (`~~~`))
3. Inline code spans:
   * region between two backticks on the same line
4. HTML comments:
   * `<!-- ... -->`
   * exception: `<!-- mdtoc -->`, `<!-- /mdtoc -->`, `<!-- mdtoc off -->`, and `<!-- mdtoc on -->`

Not ignored:

5. Blockquotes

Blockquotes are normal input lines.  
They are not treated as a special region.

Practical consequence:

* A blockquote line begins with optional spaces and then `>`.
* A heading recognized by `mdtoc` must begin with a `hashes` prefix directly in column 1.
* Therefore, blockquotes cannot match the heading syntax and need no special treatment.

_Interpretation:_

* "Do not ignore blockquotes" explicitly does not mean that headings are created from them.
* It only means that `mdtoc` does not need a dedicated blockquote mode.

### 4.3 Parsing order

Processing logically runs in this order:

1. Determine ignored regions or Markdown context.
2. Recognize the outer `mdtoc` container and config block only outside ignored regions.
3. Recognize headings only outside ignored regions.
4. Semantically normalize managed artifacts.
5. Derive the target state.
6. Render the output.

_Explanation:_

* Without this order, markers or headings inside a code fence would be ambiguous.
* This exact ambiguity is intentionally excluded here.

## 5. Heading syntax

### 5.1 Candidates for headings

Only lines that begin directly at the start of the line with one of the following prefixes are headings for `mdtoc` at all:

```text
hashes := "# " | "## " | "### " | "#### " | "##### " | "###### "
```

This also means:

* exactly one space must follow the `#` characters
* no spaces may appear before the `#` characters

_Note:_ The space is intentionally part of `hashes` here. This simplifies the parser: after the prefix, either the number, the anchor, or the title begins immediately.

### 5.2 Structure of a managed heading

Managed headings use exactly this schema:

```text
heading_line := hashes [number SP] [anchor] title
number       := DIGIT+ ("." DIGIT+)* "."
anchor       := "<a id=\"anchor_id\"></a>"
title        := NONEMPTY_TEXT
SP           := exactly one U+0020 space
```

Additional rules:

* `number` is optional.
* If `number` occurs, it appears directly after `hashes` and is followed by exactly one space.
* `anchor` is optional.
* If `anchor` occurs, it appears directly after `hashes` or directly after `number SP`.
* There is **no** space between `</a>` and the first character of the title.
* Inside the title, spaces and characters remain unchanged.
* Only headings that exactly follow this positional logic may be rewritten by `mdtoc`.

_Explanation:_

* The missing space between `</a>` and the title is intentionally preserved because it is part of the managed render format.
* The motivation is no longer `dumeng` compatibility, but an unambiguous and idempotent render schema.

Examples of valid managed headings:

```md
# Title
## 1. Introduction
## <a id="introduction"></a>Introduction
### 2.1. <a id="api-overview"></a>API Overview
```

Examples that `mdtoc` does not treat as a managed structure:

```md
 # Title
##  1. Introduction
### 1.2 Introduction
### <a id="x"></a> Introduction
```

### 5.3 Meaning of the syntax

* `### 2024 roadmap` is **not** a number because the first token does not end with `.`.
* `### 3D graphics` is **not** a number because the first token is not a pure `x.y.z.` pattern.
* `### 2.1. API` is a managed numbering syntax.

_Note:_ The pattern `### 2.1. API` is therefore intentionally reserved for `mdtoc`. Anyone writing a free heading in exactly this format is using the same syntax as the tool.

### 5.4 Supported Markdown subset

`mdtoc` is not a general Markdown parser.

For headings, v1 supports:

* ATX headings only
* only the heading syntax defined above
* no Setext headings
* no implicit or ambiguous special cases

The practical prefilter is therefore at least:

```text
^#{1,6} 
```

And the actual rewrite logic applies only to lines that also satisfy the remaining positional logic.

## 6. Small formal model

This section describes the minimal internal view that is helpful for clean implementation and tests.

### 6.1 Managed heading

Internally, this model is sufficient for a managed heading:

```text
ManagedHeading
- line_index
- level
- title_markup  // Title area as it appears in the document, but without managed numbering and without the managed inline anchor
- title_text    // Plain-text interpretation of title_markup; source for ToC link text and anchor ID
- number        // derived or empty
- anchor_id     // derived or empty
```

Semantically important are only:

* `level`
* `title_markup`
* `title_text`

Derived from these are:

* `number`
* `anchor_id`

_Explanation:_

* The distinction between `title_markup` and `title_text` remains useful even though `title_text` is derived separately from the raw line.
* In the current implementation, `title_text` is derived by a self-contained inline-text extraction step.
* The authoritative value is the deterministic result of that extraction logic, not the output of an external Markdown renderer.

### 6.2 Document state

For `mdtoc`, a document is practically in one of these states:

* `unmanaged`
  No valid `mdtoc` container is present.

* `managed`
  A valid `mdtoc` container is present. The container may include a config block; if it does not, defaults apply.

* `generated target`
  The document byte-for-byte matches the output that `regen` would produce from the current content and container config.

`mdtoc` does not persist a `state` field. A stripped document is still a managed document, but it does not match the generated target and `check` therefore returns a mismatch until `regen` is run.

### 6.3 Processing pipeline

Processing always follows the same simple pattern:

```text
parse -> normalize -> derive -> render
```

This means:

* **parse**: recognize container, config, and headings
* **normalize**: semantically remove managed numbers and managed anchors
* **derive**: recalculate numbers, anchor IDs, and ToC
* **render**: write the document back deterministically

_Note:_ This is not meant to force a large AST architecture. It only defines which pieces of information are semantically relevant and which are render artifacts only.

### 6.4 Validity range of `min` and `max`

This version uses the following simple, easy-to-understand rule:

* `min` and `max` in config, and `--min-level` and `--max-level` in the CLI, filter the same set of headings for
  * ToC generation
  * numbering
  * anchor generation

Practical consequence:

* During `generate`, all managed numbers and managed anchors are first removed from all managed headings.
* Then numbers and anchors are only re-applied to headings within the active level range.
* Headings outside the range remain in the document unchanged, but are no longer actively managed.

_Cross-reference:_

* The same rule is used again in section 10 for the ToC.
* This is intentional; both places describe the same contract from two different perspectives.

## 7. Config block

The config block is an optional HTML comment placed directly before `<!-- /mdtoc -->`.
It stores whitespace-separated `key=value` fields:

```html
<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->
```

The same config may be written across multiple lines:

```html
<!--
numbering=true min=2 max=4
slug=github anchor=true link=true toc=true bullets=auto
-->
```

Rules:

* The config block may be deleted completely; then all defaults apply.
* All fields use `key=value`.
* Field order is arbitrary.
* Unknown keys are ignored so newer generated configs remain readable by older versions.
* Duplicate known keys are invalid.
* Invalid known values are invalid.
* Boolean values accept `true|false|on|off`; normalized output writes `true|false`.
* Allowed known keys:
  * `numbering=true|false|on|off`
  * `min=<N>`
  * `max=<N>`
  * `slug=github|gitlab|crossnote`
  * `anchor=true|false|on|off`
  * `link=true|false|on|off`
  * `toc=true|false|on|off`
  * `bullets=auto|*|-|+`
* `min` and `max` are positive integers.
* `min` must not be greater than `max`.
* `max` must not be greater than 6.
* `anchor` is strictly Boolean. It only controls whether managed inline anchors are rendered.
* `slug` defines the anchor/link slug algorithm globally, independently of `anchor`.
* `link` controls whether ToC entries are Markdown links or plain text list items.
* `generate` writes all generator options into the config block when the output has no prior container, when a config block already existed, or when non-default config must be persisted.
* If a managed container has no config block and the effective config is still default, rewrites preserve the absent config block.
* `--file`, `--help`, `--version`, `--verbose`, and `--raw` are not persisted.
* `strip` keeps the config block if it exists.
* `strip --raw` removes the complete container, including any config block.
* `toc=off` means: the managed ToC area remains part of the container, but is rendered empty.

There is no `state` field and no `container-version` field. Legacy `<!-- mdtoc-config ... -->` blocks are not part of this specification.

## 8. CLI interface

### 8.1 Commands

| Option                                             | Description                                                                                       |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------|
| `mdtoc --version`                                  | Prints short version information.                                                                 |
| `mdtoc --version --verbose`                        | Prints detailed version information.                                                              |
|                                                    |                                                                                                   |
| `mdtoc --help`                                     | Prints short help text.                                                                           |
| `mdtoc --help --verbose`                           | Prints long help text.                                                                            |
|                                                    |                                                                                                   |
| `mdtoc [--file <name>\|<name>] [GENERATE OPTIONS]` | root mode: uses `regen` for valid managed input without generate overrides, otherwise `generate`. |
| `mdtoc [GENERATE OPTIONS] < INPUT.md`              | root mode on `stdin`; same dispatch rule as above.                                                |
|                                                    |                                                                                                   |
| `mdtoc generate [--verbose] [OPTIONS]`             | generates/updates ToC, numbers, anchors.                                                          |
| `mdtoc generate  --help`                           | Prints long help text specifically for generate.                                                  |
|                                                    |                                                                                                   |
| `mdtoc regen    [--verbose]`                       | regenerates from the persisted container config.                                                  |
| `mdtoc refresh  [--verbose]`                       | alias for `regen`.                                                                                |
| `mdtoc regen     --help`                           | Prints long help text specifically for regen.                                                     |
| `mdtoc refresh   --help`                           | Prints the same help text as `regen`.                                                             |
|                                                    |                                                                                                   |
| `mdtoc strip    [--verbose] [--raw]`               | removes ToC, numbers, anchors and optionally config.                                              |
| `mdtoc strip     --help`                           | Prints long help text specifically for strip.                                                     |
|                                                    |                                                                                                   |
| `mdtoc check    [--verbose]`                       | checks whether the document matches regenerated output.                                           |
| `mdtoc check     --help`                           | Prints long help text specifically for check.                                                     |

### 8.2 Options for `generate`

| Option                                   | Default  | Meaning                                                            |
|------------------------------------------|----------|--------------------------------------------------------------------|
| `--numbering <on\|off\|true\|false>`     | `on`     | enable or disable heading numbering                                |
| `--min-level <N>`                        | `2`      | minimum managed heading level (>=1)                                |
| `--max-level <N>`                        | `4`      | maximum managed heading level (<=6)                                |
| `--slug <github\|gitlab\|crossnote>`     | `github` | select the slug algorithm for inline anchors and ToC link targets  |
| `--anchor <on\|off\|true\|false>`        | `on`     | enable or disable managed inline anchors                           |
| `--link <on\|off\|true\|false>`          | `on`     | render ToC entries as Markdown links when enabled                  |
| `--toc <on\|off\|true\|false>`           | `on`     | renders the managed ToC area when `on`, leaves it empty when `off` |
| `--bullets <auto\|*\|-\|+>`              | `auto`   | choose the generated unordered-list bullet style                   |
| `--file <name>`                          | –        | read and overwrite file                                            |
| `--verbose`                              | `off`    | diagnostic and progress messages on `stderr`                       |
| `--help`                                 | –        | show help                                                          |

Input form rules:

* File-backed commands accept either a positional file argument or `--file <name>`.
* The positional-file shorthand applies both to explicit subcommands and to root mode.
* Exactly one input source is allowed per invocation.

Short forms:

| Option        | Short form |
|---------------|------------|
| `--numbering` | `-n`       |
| `--anchor`    | `-a`       |
| `--bullets`   | `-b`       |
| `--file`      | `-f`       |
| `--verbose`   | `-v`       |
| `--help`      | `-h`       |

Compatibility note:

* The current CLI also tolerates the Go `flag` package's one-dash long-option spellings such as `-toc`, `-anchor`, `-slug`, or `-numbering`.
* These are accepted as compatibility aliases for the documented double-dash generate option forms.
* Documentation and examples should still prefer the canonical double-dash form.

### 8.3 I/O and logging behavior

* With a positional file or `--file`, the file is read and overwritten.
* Without file input, document input comes from `stdin` and document output goes to `stdout`.
* If neither file input nor piped `stdin` is provided, the command fails with an input-source error.
* If more than one input source is provided, the command fails with an input-source conflict error.
* Successful commands produce no output except for `--help`, `--version`, or `--verbose`.
* Errors and diagnostic messages are written exclusively to `stderr`.
* Collected warnings are only printed in verbose mode.

Root-mode dispatch rules:

* If the input contains a valid managed container and no generate-control flags are explicitly set, root mode behaves like `regen`.
* If the input does not contain a valid managed container, root mode behaves like `generate`.
* If at least one generate-control flag is explicitly set, root mode behaves like `generate` even when a valid managed container exists.

Generate-control flags:

* `--numbering`, `-n`
* `--min-level`
* `--max-level`
* `--slug`
* `--anchor`, `-a`
* `--link`
* `--toc`
* `--bullets`, `-b`

## 9. Commands

### 9.1 `generate`

Behavior:

1. Parse the document.
2. If no managed container is present, create the complete container at the beginning of the file.
3. If marker structure or config is invalid: error and no change.
4. Semantically remove existing managed artifacts:
   * ToC content
   * managed heading numbers
   * managed inline anchors
5. Determine relevant headings.
6. Recalculate numbers if `numbering=true`.
7. Recalculate `anchor_id` for all relevant headings using `slug`.
8. Render managed inline anchors only if `anchor=true`.
9. Re-render the ToC if `toc=true`; otherwise render the managed ToC area empty.
10. Re-render headings.
11. Re-render config according to section 7.
12. Write the document back.

Additional rules:

* Numbering and anchor ID are strictly decoupled.
* Inline anchor IDs are computed from the unnumbered title.
* Duplicate IDs are resolved deterministically.
* Foreign content in the ToC area is not deleted, but preserved as an HTML comment.
* `--anchor on` and `--anchor true` enable inline anchors; normalized config writes `anchor=true`.
* `--anchor off` and `--anchor false` disable inline anchors; normalized config writes `anchor=false`.
* On success, the result is idempotent.

Example of a rendered heading:

```md
### 4.1. <a id="open-source"></a>Open source
```

_Explanation:_

* Additional user-defined inline elements in the title are not fundamentally forbidden.
* For the normative derivation of `anchor_id`, however, it is not the raw markup that counts, but `title_text` according to section 6 and section 11.

### 9.2 `strip`

Behavior:

* requires a valid managed container
* removes managed ToC content
* removes managed heading numbers
* removes managed inline anchors
* keeps the outer container
* keeps the config block if it exists

After `strip`, this structure is still valid:

```md
<!-- mdtoc -->
<!-- numbering=true min=2 max=4 slug=github anchor=true link=true toc=true bullets=auto -->
<!-- /mdtoc -->
```

Error case:

* no valid managed container -> error
* no implicit repair

### 9.3 `strip --raw`

Behavior:

* first attempts the normal structural parse
* if that succeeds, it removes the complete managed container, if present:
  * `<!-- mdtoc -->`
  * ToC content
  * optional config block
  * `<!-- /mdtoc -->`
* additionally removes managed heading numbers
* additionally removes managed inline anchors
* if strict parsing fails, it falls back to locating only the outer managed markers and removes the container by marker bounds
* after fallback container removal, heading normalization is attempted again on the remaining body text

Conservative rule:

* If it cannot be determined with certainty whether a number or an inline anchor is managed, the content remains unchanged.
* If fallback parsing was needed, a warning is collected; this warning is only emitted in verbose mode.
* If fallback container removal succeeds but heading parsing still fails afterward, `strip --raw` returns the original parsing error.

Use cases:

* damaged config
* migration
* complete removal of `mdtoc` management
* tests

### 9.4 `regen`

Behavior:

* requires a valid managed container
* reads the persisted normalized config from the existing managed container, or uses defaults when no config block is present
* regenerates the document into the generated target state
* writes the updated normalized config according to section 7
* `refresh` is a supported command alias with the same behavior

Error case:

* no valid managed container -> error

### 9.5 `check`

Behavior:

* requires a valid managed container
* reconstructs the target state from the current document content and config
* compares target state and actual state byte-for-byte
* returns `0` if both are identical
* returns exit code `2` on mismatch

No side effects:

* `check` never modifies the document

_Interpretation:_

* `check` always reconstructs the generated target state.
* A stripped managed document is therefore valid input but fails `check` until `regen` is run.

_Note:_ "byte-for-byte" sounds more formal than it is in practice. What it means is: `check` computes the same text that `generate` or `strip` would write, and compares exactly that.

## 10. ToC rules

The ToC is based on all managed headings within `min` to `max`, inclusive.

Render rules:

* One heading produces exactly one ToC entry.
* The hierarchy follows the heading level.
* Each additional level relative to `min` is indented by two spaces.
* Every entry is a Markdown list item.
* If `link=true`, the list item text is a Markdown link.
* If `link=false`, the list item contains plain text only.
* The list marker is chosen according to `bullets`.

Example:

```md
* [1. Introduction](#introduction)
  * [1.1. API](#api)
```

Bullet selection:

* with `bullets=*`, `-`, or `+`, the configured marker is used exactly
* with `bullets=auto`, `mdtoc` counts unordered-list markers in the body text outside fences, generic HTML comments, and excluded regions
* recognized body list markers are `*`, `-`, and `+` followed by one space
* if one marker has the highest count, it is used
* ties are resolved in the fixed order `*` > `-` > `+`

Displayed in the link text:

* with `numbering=true`: `number + title`
* with `numbering=false`: title only

Link target when `link=true`:

* with `anchor=true`: `#` + `anchor_id`
* with `anchor=false` and `numbering=false`: `#` + slug(`title_source`)
* with `anchor=false` and `numbering=true`: `#` + slug(`number + " " + title_source`)
* collision handling follows the same per-document slugger behavior described in section 11

Behavior of `anchor`:

* with `anchor=true`, `mdtoc` renders a managed inline anchor
* with `anchor=false`, `mdtoc` does not render a managed inline anchor; ToC links target the renderer-derived heading ID based on the rendered heading text
* `anchor` does not select the slug algorithm

Behavior of `slug`:

* `slug=github` uses the GitHub-compatible rules in section 11.3
* `slug=gitlab` uses the GitLab rules in section 11.7
* `slug=crossnote` uses the Crossnote / Markdown Preview Enhanced rules in section 11.8
* For `slug=github` and `slug=gitlab`, `title_source` is `title_text`.
* For `slug=crossnote`, `title_source` is `title_markup`.
* ATX closing hash markers are stripped from `title_text`, but remain visible to the Crossnote `title_markup` slug path. Therefore `## An ATX title with closing hash markers  ####` targets `#an-atx-title-with-closing-hash-markers--` when `slug=crossnote`, `anchor=false`, and `link=true`.

_Explanation:_

* `anchor=false` is therefore renderer-dependent because there is no managed inline anchor to pin the target.
* Fully portable and renderer-independent ToC links are guaranteed only with `anchor=true`.

_Cross-reference:_

* The actual norm for `anchor_id` appears exclusively in section 11.
* Section 10 only describes the use of the already computed ID in the ToC.

## 11. Slug and anchor ID specification

`slug` selects the algorithm used for managed inline anchor IDs and generated ToC link targets.

Inline anchor IDs are deterministically derived from the **unnumbered title**. ToC targets with `anchor=false` are derived from the rendered heading source, including a managed number prefix when numbering is enabled.

### 11.1 Goal

The generated values should be:

* stable
* deterministic
* readable
* compatible with the selected renderer/profile
* identical for inline anchors and ToC links when `anchor=true`

### 11.2 Input for the derivation

For every managed heading, the following applies:

```text
slug_source := title_text     // github, gitlab
slug_source := title_markup   // crossnote
anchor_id   := slugify(slug_source)
```

The following also applies:

* `title_text` is **not** the raw title string from the line.
* `title_text` is the plain-text interpretation of `title_markup`.
* Managed numbering and the managed inline anchor are **not** part of `title_text`.
* ATX closing hash markers are stripped when deriving `title_text`.
* The current implementation derives `title_text` with these inline rules:
  * backtick code spans contribute only their visible content
  * Markdown links and images contribute only their visible label or alt text
  * HTML tags are removed
  * inline formatting markers `*`, `_`, and `~` are removed
  * whitespace is collapsed to single spaces and trimmed at the ends

_Explanation:_

* The current implementation intentionally keeps this extraction logic small and self-contained.
* Only this keeps slug/anchor generation, ToC link text, and profile behavior consistent.

### 11.3 GitHub-compatible basic rules

The function `slugify` MUST perform at least these steps:

1. Input is `title_text`.
2. Letters are converted to lowercase using Unicode lowercasing.
3. Markdown formatting characters and inline markup do not contribute literal characters to the slug; only their visible text content counts.
4. Unicode letters and Unicode decimal digits are preserved.
5. Runs of whitespace and punctuation **between** preserved text parts are normalized to exactly one `-`.
6. Leading and trailing runs of whitespace or punctuation do **not** create a leading or trailing `-`.
7. If the resulting slug already exists in the same document, `-1`, `-2`, `-3`, ... is appended.

_Interpretation:_

* These rules follow GitHub’s documented basic rules in a form that is explicitly testable for `mdtoc`.
* For edge cases not documented there, `mdtoc` makes additional decisions in the following subsections.

### 11.4 Explicit decisions for edge cases

Additionally, the following applies in `mdtoc` v1:

* Symbols, emojis, and other non-letter/non-digit characters are removed.
* Runs of whitespace/punctuation are not rendered as multiple `--`, `---`, etc., but collapsed to exactly one `-`.
* Collision resolution starts at the **second** occurrence with `-1`.
* If the normalized slug becomes empty, `mdtoc` uses the fallback `section`.
* Further collisions on this fallback are resolved as `section-1`, `section-2`, ...

_Explanation:_

* The fallback `section` is a deliberate `mdtoc` decision.
* GitHub’s public basic rules do not explicitly describe this empty-slug edge case.

### 11.5 Relationship to inline anchor syntax

If `anchor=true`, `mdtoc` renders exactly this form:

```html
<a id="anchor_id"></a>
```

The following applies:

* the string in `id="..."` MUST exactly match the `anchor_id` computed according to this section
* with `anchor=true`, `anchor_id` and the ToC link target are therefore the same string

_Cross-reference:_

* Section 5 only defines the position and render format of the inline anchor.
* The string inside `id="..."` is normalized exclusively here in section 11.

### 11.6 Examples

#### Example 1

```md
### Open source
```

→

```text
open-source
```

#### Example 2

```md
### This'll be a _Helpful_ Section About the Greek Letter Θ!
```

→

```text
thisll-be-a-helpful-section-about-the-greek-letter-θ
```

#### Example 3

```md
### Übergrößenträger & naïve façade – déjà vu!
```

→

```text
übergrößenträger-naïve-façade-déjà-vu
```

#### Example 4

```md
### 中文 русский عربى
```

→

```text
中文-русский-عربى
```

#### Example 5

```md
### 🚀 !!!
```

→

```text
section
```

#### Example 6

Two identical headings `### API` result in:

```text
api
api-1
```

### 11.7. <a id="gitlab-slug-profile"></a>GitLab slug profile

If `slug=gitlab`, `mdtoc` MUST derive IDs according to the GitLab heading-ID rules documented for GLFM.

The GitLab profile applies these steps:

1. Input is `title_text`.
2. All text is converted to lowercase.
3. All non-word text is removed.
4. Spaces are converted to `-`.
5. Two or more adjacent hyphens are collapsed to one.
6. If the resulting ID already exists in the same document, `-1`, `-2`, `-3`, ... is appended.

For `mdtoc`, the GitLab profile is interpreted as follows:

* Unicode letters and Unicode decimal digits are preserved.
* `_` is preserved as part of a word.
* Existing `-` characters are preserved and then normalized by the hyphen-collapse step.
* Punctuation between preserved text parts is removed, not converted to a separator.
* Leading and trailing hyphens are trimmed after normalization.
* If the normalized ID becomes empty, `mdtoc` uses the fallback `section`.

The GitLab profile therefore differs from the GitHub-compatible profile in important edge cases:

* `3.5` becomes `35` in GitLab mode, but `3-5` in GitHub mode.
* `A+B` becomes `ab` in GitLab mode, but `a-b` in GitHub mode.
* `foo_bar` stays `foo_bar` in GitLab mode, but becomes `foo-bar` in GitHub mode.

Examples:

```md
## Version 3.5
## A+B
## foo_bar baz
```

In GitLab mode, these headings yield:

```text
version-35
ab
foo_bar-baz
```

### 11.8. <a id="crossnote-slug-profile"></a>Crossnote / Markdown Preview Enhanced slug profile

If `slug=crossnote`, `mdtoc` derives IDs with the Crossnote / Markdown Preview Enhanced style used by the `github-slugger` plus `uslug` pipeline.

For `mdtoc`, this profile is interpreted as follows:

* The input is `title_markup`, not `title_text`.
* Text is trimmed and lowercased.
* `~` and `。` are removed before slugging.
* Whitespace is converted to a temporary separator before punctuation stripping.
* Unicode letters, Unicode decimal digits, combining marks, `_`, `-`, and the temporary separator are preserved.
* Other punctuation is removed.
* Repeated `-` characters are collapsed.
* The temporary separator is rendered as `-`.
* If the normalized ID becomes empty, `mdtoc` uses the fallback `section`.
* Collision handling appends `-1`, `-2`, `-3`, ... starting at the second occurrence.

Important examples:

```md
## 1.1. API
## An ATX title with closing hash markers  ####
```

In Crossnote mode, these headings yield:

```text
11-api
an-atx-title-with-closing-hash-markers--
```

## 12. Error behavior, logging, and exit codes

Error cases:

* missing or incomplete `mdtoc` container
* invalid config block
* parsing error
* invalid options

Basic rules:

* Errors are written to `stderr`.
* On errors, there is no implicit repair except for the explicitly allowed creation of a new container by `generate` when no `mdtoc` management exists yet.
* Successful commands write no status messages to `stdout`.

Recommended exit codes:

* `0` -> success
* `1` -> parsing, config, or CLI error
* `2` -> `check` found a mismatch

## 13. Idempotence

Idempotence is part of the contract.

Examples:

```bash
mdtoc generate
mdtoc generate
```

=> no further change on the second run

```bash
mdtoc strip
mdtoc strip
```

=> no further change on the second run

```bash
mdtoc strip --raw
mdtoc strip --raw
```

=> no further change on the second run

_Cross-reference:_

* Idempotence is already defined in section 1 as a core principle and in section 9 as command semantics.
* Section 13 intentionally repeats the contract once again in test form.

## 14. Extensibility

Possible later extensions:

* alternative anchor styles
* alternative ToC formats
* versioning in the config block
* additional output formats

_Note:_ These points are explicitly extensions. They should not make v1 unnecessarily complex.

## 15. Current implementation basis (informative)

The current Go implementation is intentionally self-contained.

Current basis:

* a line-oriented parser for
  * the managed container
  * fenced code blocks
  * generic HTML comments
  * exclusion regions
  * heading candidates
* a small inline-text extractor for deriving `title_text`
* an internal slugger implementation for the GitHub, GitLab, and Crossnote slug profiles

Current implementation notes:

* heading recognition is intentionally restricted to the explicit ATX subset from section 5
* the normative slug and anchor rules from section 11 are implemented directly in `mdtoc`
* no external Markdown renderer is the normative source of `anchor_id`
* the current code does not require a full Markdown AST to preserve the documented behavior

Alternative implementations:

* Another implementation MAY use a Markdown parser library internally.
* Such an implementation MUST still preserve the external behavior defined by this specification.
* In particular, `title_text`, slug generation, collision handling, ignored regions, and marker/config handling must remain behavior-compatible with the current implementation.

_Explanation:_

* The current implementation favors a narrow, deterministic parser over a full Markdown dependency.
* The actual domain logic of `mdtoc` remains small and explicit: finding containers, normalizing managed headings, deriving numbers and IDs, and rendering deterministically.

---
