spec: formal vimdoc specification document #61

Open
opened 2026-03-18 02:28:14 +00:00 by barrettruth · 7 comments
barrettruth commented 2026-03-18 02:28:14 +00:00

Problem

The formatter, diagnostics, and parser all enforce implicit rules about what
valid vimdoc looks like — but those rules are nowhere written down
prescriptively. This blocks spec-enforcement diagnostics and makes it
impossible to confidently answer "is this correct vimdoc?"

Empirical grounding

The corpus work on 135 Neovim 0.12 runtime help files provides ground truth:

  • Line width: tw=78 is universal; 96.2% of prose lines already fit
  • Tab stops: ts=8; tab characters are structural (command-reference
    column alignment), not prose whitespace
  • Sentence spacing: double-space after ./?/! is a live convention
    (confirmed in api.txt and others)
  • Separator threshold: = or - repeated ≥10 times on its own line;
    threshold avoids false-positives on things like -- comment or option
    strings
  • Code fences: both > (trailing on a prose line) and >language (alone
    on its own line) start code blocks; blank line or < ends them; an
    unindented non-< line also ends the block
  • List items: - , * , (unordered) and N. (ordered); * tag *
    without adjacent space is a tag definition, not a list item

Scope

A prescriptive spec document (docs/spec.md) covering:

  • Block structure: separators, headings (text + right-justified *tag*),
    code blocks, list items, blank lines
  • Inline spans: tag definitions (*tag*), tag references (|taglink|),
    code spans (`code`), tab-aligned columns
  • Whitespace conventions: tw=78, ts=8, sentence spacing, trailing
    whitespace
  • Disambiguation rules: * tag * vs list item, -> not a fence, ordered
    list vs TOC heading annotation, valid taglink syntax (no spaces, no pipe
    inside backtick span)
  • Edge cases documented by corpus work: blank line terminates code block
    (and its diagnostic implications), >language code fence, pipe characters
    in prose that are not taglinks

Prerequisite for

  • Spec-enforcement diagnostics (unclosed delimiters, malformed headings,
    invalid code fence syntax)
  • Proposing a formal spec upstream to Neovim
  • Resolving #51 (false-positive unresolved-tag warnings) with a precise
    definition of what constitutes a valid taglink
## Problem The formatter, diagnostics, and parser all enforce implicit rules about what valid vimdoc looks like — but those rules are nowhere written down prescriptively. This blocks spec-enforcement diagnostics and makes it impossible to confidently answer "is this correct vimdoc?" ## Empirical grounding The corpus work on 135 Neovim 0.12 runtime help files provides ground truth: - **Line width**: `tw=78` is universal; 96.2% of prose lines already fit - **Tab stops**: `ts=8`; tab characters are structural (command-reference column alignment), not prose whitespace - **Sentence spacing**: double-space after `.`/`?`/`!` is a live convention (confirmed in `api.txt` and others) - **Separator threshold**: `=` or `-` repeated ≥10 times on its own line; threshold avoids false-positives on things like `-- comment` or option strings - **Code fences**: both `>` (trailing on a prose line) and `>language` (alone on its own line) start code blocks; blank line or `<` ends them; an unindented non-`<` line also ends the block - **List items**: `- `, `* `, `• ` (unordered) and `N. ` (ordered); `* tag *` without adjacent space is a tag definition, not a list item ## Scope A prescriptive spec document (`docs/spec.md`) covering: - **Block structure**: separators, headings (text + right-justified `*tag*`), code blocks, list items, blank lines - **Inline spans**: tag definitions (`*tag*`), tag references (`|taglink|`), code spans (`` `code` ``), tab-aligned columns - **Whitespace conventions**: `tw=78`, `ts=8`, sentence spacing, trailing whitespace - **Disambiguation rules**: `* tag *` vs list item, `->` not a fence, ordered list vs TOC heading annotation, valid taglink syntax (no spaces, no pipe inside backtick span) - **Edge cases documented by corpus work**: blank line terminates code block (and its diagnostic implications), `>language` code fence, pipe characters in prose that are not taglinks ## Prerequisite for - Spec-enforcement diagnostics (unclosed delimiters, malformed headings, invalid code fence syntax) - Proposing a formal spec upstream to Neovim - Resolving #51 (false-positive unresolved-tag warnings) with a precise definition of what constitutes a valid taglink
eamonburns commented 2026-03-25 19:06:16 +00:00

Sentence spacing: double-space after ./?/! is a live convention (confirmed in api.txt and others)

I was skeptical of this assertion. In api.txt specifically, there are only 8 instances of double spaces, but 396 instances of single spaces.

I wrote a bash script to get some actual statistics (uses rg, awk, and mlr):

# Change this to the location of your runtime/doc directory
pushd /usr/share/nvim/runtime/doc
{
    echo "file,type,matches"
    rg '[.?!]  ' --include-zero --count-matches | awk -F: '{ print $1 ",double," $2 }'
    rg '[.?!] [^ ]' --include-zero --count-matches | awk -F: '{ print $1 ",single," $2 }'
} > ~/match_counts.csv
mlr --csv reshape -s type,matches \
    then put '$["%double"] = ($double + $single > 0) ? int(100 * $double / ($double + $single)) : "N/A"' \
    then cut -o -f file,double,single,"%double" \
    then sort -f file \
    ~/match_counts.csv > ~/stats.csv
popd
Click to view stats

file: help file
double: number of double spaces after ./?/!
single: number of single spaces after ./?/!
%double: percentage of double spaces out of the total (100 * $double / ($double + $single))

file double single double%
api.txt 8 396 1
arabic.txt 23 20 53
autocmd.txt 198 97 67
change.txt 308 99 75
channel.txt 12 41 22
cmdline.txt 192 22 89
credits.txt 12 18 40
debug.txt 18 23 43
deprecated.txt 0 17 0
dev_arch.txt 0 8 0
dev_style.txt 4 118 3
dev_theme.txt 0 14 0
dev_tools.txt 0 13 0
dev_vimpatch.txt 0 32 0
develop.txt 27 87 23
diagnostic.txt 64 129 33
diff.txt 76 23 76
digraph.txt 35 8 81
editing.txt 301 74 80
editorconfig.txt 0 18 0
faq.txt 0 61 0
filetype.txt 106 67 61
fold.txt 115 15 88
ft_ada.txt 17 39 30
ft_hare.txt 0 8 0
ft_ps1.txt 0 2 0
ft_raku.txt 0 23 0
ft_rust.txt 2 55 3
ft_sql.txt 109 44 71
gui.txt 129 35 78
health.txt 1 11 8
hebrew.txt 11 5 68
help.txt 5 3 62
helphelp.txt 63 17 78
if_perl.txt 17 10 62
if_pyth.txt 37 40 48
if_ruby.txt 11 10 52
indent.txt 97 41 70
index.txt 6 9 40
insert.txt 309 84 78
intro.txt 121 51 70
job_control.txt 0 9 0
lsp.txt 52 255 16
lua-bit.txt 0 72 0
lua-guide.txt 0 49 0
lua-plugin.txt 0 26 0
lua.txt 6 451 1
luaref.txt 10 757 1
luvref.txt 1 261 0
map.txt 253 65 79
mbyte.txt 120 11 91
message.txt 149 27 84
mlang.txt 37 7 84
motion.txt 248 33 88
news-0.10.txt 0 57 0
news-0.9.txt 2 32 5
news.txt 0 39 0
nvim.txt 1 11 8
options.txt 1134 253 81
pattern.txt 223 43 83
pi_gzip.txt 5 2 71
pi_msgpack.txt 17 13 56
pi_paren.txt 13 0 100
pi_spec.txt 16 4 80
pi_tar.txt 16 11 59
pi_tutor.txt 0 7 0
pi_zip.txt 13 15 46
provider.txt 8 26 23
quickfix.txt 267 142 65
quickref.txt 3 22 12
recover.txt 36 5 87
remote.txt 10 8 55
remote_plugin.txt 0 18 0
repeat.txt 162 34 82
rileft.txt 17 0 100
russian.txt 6 4 60
scroll.txt 43 10 81
sign.txt 34 19 64
spell.txt 378 78 82
starting.txt 240 159 60
support.txt 0 4 0
syntax.txt 609 220 73
tabpage.txt 52 29 64
tags 0 0 N/A
tagsrch.txt 192 40 82
terminal.txt 45 29 60
testing.txt 3 7 30
tips.txt 38 26 59
treesitter.txt 1 207 0
tui.txt 23 22 51
uganda.txt 62 2 96
ui.txt 7 152 4
undo.txt 72 19 79
userfunc.txt 82 28 74
usr_01.txt 16 4 80
usr_02.txt 120 22 84
usr_03.txt 139 9 93
usr_04.txt 135 20 87
usr_05.txt 108 19 85
usr_06.txt 39 3 92
usr_07.txt 104 8 92
usr_08.txt 89 0 100
usr_09.txt 67 1 98
usr_10.txt 188 15 92
usr_11.txt 60 16 78
usr_12.txt 73 0 100
usr_20.txt 96 0 100
usr_21.txt 95 11 89
usr_22.txt 64 6 91
usr_23.txt 57 1 98
usr_24.txt 124 6 95
usr_25.txt 112 19 85
usr_26.txt 54 3 94
usr_27.txt 107 1 99
usr_28.txt 87 0 100
usr_29.txt 119 0 100
usr_30.txt 143 10 93
usr_31.txt 63 4 94
usr_32.txt 31 0 100
usr_40.txt 144 8 94
usr_41.txt 369 58 86
usr_42.txt 96 2 97
usr_43.txt 43 8 84
usr_44.txt 178 2 98
usr_45.txt 123 1 99
usr_toc.txt 0 0 N/A
various.txt 64 51 55
vi_diff.txt 87 13 87
vietnamese.txt 4 8 33
vim_diff.txt 13 89 12
vimeval.txt 424 138 75
vimfn.txt 969 498 66
visual.txt 70 19 78
vvars.txt 99 44 69
windows.txt 233 103 69

Totals:

  • Double spaces: 12346
  • Single spaces: 6927
  • Percentage of double spaces: 64%

So, yeah, there are slightly more double spaces than there are single spaces.

> Sentence spacing: double-space after ./?/! is a live convention (confirmed in api.txt and others) I was skeptical of this assertion. In `api.txt` specifically, there are only 8 instances of double spaces, but 396 instances of single spaces. I wrote a bash script to get some actual statistics (uses `rg`, `awk`, and `mlr`): ```bash # Change this to the location of your runtime/doc directory pushd /usr/share/nvim/runtime/doc { echo "file,type,matches" rg '[.?!] ' --include-zero --count-matches | awk -F: '{ print $1 ",double," $2 }' rg '[.?!] [^ ]' --include-zero --count-matches | awk -F: '{ print $1 ",single," $2 }' } > ~/match_counts.csv mlr --csv reshape -s type,matches \ then put '$["%double"] = ($double + $single > 0) ? int(100 * $double / ($double + $single)) : "N/A"' \ then cut -o -f file,double,single,"%double" \ then sort -f file \ ~/match_counts.csv > ~/stats.csv popd ``` <details> <summary>Click to view stats</summary> `file`: help file `double`: number of double spaces after `.`/`?`/`!` `single`: number of single spaces after `.`/`?`/`!` `%double`: percentage of double spaces out of the total (`100 * $double / ($double + $single)`) | file | double | single | double% | | --- | --- | --- | --- | | api.txt | 8 | 396 | 1 | | arabic.txt | 23 | 20 | 53 | | autocmd.txt | 198 | 97 | 67 | | change.txt | 308 | 99 | 75 | | channel.txt | 12 | 41 | 22 | | cmdline.txt | 192 | 22 | 89 | | credits.txt | 12 | 18 | 40 | | debug.txt | 18 | 23 | 43 | | deprecated.txt | 0 | 17 | 0 | | dev_arch.txt | 0 | 8 | 0 | | dev_style.txt | 4 | 118 | 3 | | dev_theme.txt | 0 | 14 | 0 | | dev_tools.txt | 0 | 13 | 0 | | dev_vimpatch.txt | 0 | 32 | 0 | | develop.txt | 27 | 87 | 23 | | diagnostic.txt | 64 | 129 | 33 | | diff.txt | 76 | 23 | 76 | | digraph.txt | 35 | 8 | 81 | | editing.txt | 301 | 74 | 80 | | editorconfig.txt | 0 | 18 | 0 | | faq.txt | 0 | 61 | 0 | | filetype.txt | 106 | 67 | 61 | | fold.txt | 115 | 15 | 88 | | ft_ada.txt | 17 | 39 | 30 | | ft_hare.txt | 0 | 8 | 0 | | ft_ps1.txt | 0 | 2 | 0 | | ft_raku.txt | 0 | 23 | 0 | | ft_rust.txt | 2 | 55 | 3 | | ft_sql.txt | 109 | 44 | 71 | | gui.txt | 129 | 35 | 78 | | health.txt | 1 | 11 | 8 | | hebrew.txt | 11 | 5 | 68 | | help.txt | 5 | 3 | 62 | | helphelp.txt | 63 | 17 | 78 | | if_perl.txt | 17 | 10 | 62 | | if_pyth.txt | 37 | 40 | 48 | | if_ruby.txt | 11 | 10 | 52 | | indent.txt | 97 | 41 | 70 | | index.txt | 6 | 9 | 40 | | insert.txt | 309 | 84 | 78 | | intro.txt | 121 | 51 | 70 | | job_control.txt | 0 | 9 | 0 | | lsp.txt | 52 | 255 | 16 | | lua-bit.txt | 0 | 72 | 0 | | lua-guide.txt | 0 | 49 | 0 | | lua-plugin.txt | 0 | 26 | 0 | | lua.txt | 6 | 451 | 1 | | luaref.txt | 10 | 757 | 1 | | luvref.txt | 1 | 261 | 0 | | map.txt | 253 | 65 | 79 | | mbyte.txt | 120 | 11 | 91 | | message.txt | 149 | 27 | 84 | | mlang.txt | 37 | 7 | 84 | | motion.txt | 248 | 33 | 88 | | news-0.10.txt | 0 | 57 | 0 | | news-0.9.txt | 2 | 32 | 5 | | news.txt | 0 | 39 | 0 | | nvim.txt | 1 | 11 | 8 | | options.txt | 1134 | 253 | 81 | | pattern.txt | 223 | 43 | 83 | | pi_gzip.txt | 5 | 2 | 71 | | pi_msgpack.txt | 17 | 13 | 56 | | pi_paren.txt | 13 | 0 | 100 | | pi_spec.txt | 16 | 4 | 80 | | pi_tar.txt | 16 | 11 | 59 | | pi_tutor.txt | 0 | 7 | 0 | | pi_zip.txt | 13 | 15 | 46 | | provider.txt | 8 | 26 | 23 | | quickfix.txt | 267 | 142 | 65 | | quickref.txt | 3 | 22 | 12 | | recover.txt | 36 | 5 | 87 | | remote.txt | 10 | 8 | 55 | | remote_plugin.txt | 0 | 18 | 0 | | repeat.txt | 162 | 34 | 82 | | rileft.txt | 17 | 0 | 100 | | russian.txt | 6 | 4 | 60 | | scroll.txt | 43 | 10 | 81 | | sign.txt | 34 | 19 | 64 | | spell.txt | 378 | 78 | 82 | | starting.txt | 240 | 159 | 60 | | support.txt | 0 | 4 | 0 | | syntax.txt | 609 | 220 | 73 | | tabpage.txt | 52 | 29 | 64 | | tags | 0 | 0 | N/A | | tagsrch.txt | 192 | 40 | 82 | | terminal.txt | 45 | 29 | 60 | | testing.txt | 3 | 7 | 30 | | tips.txt | 38 | 26 | 59 | | treesitter.txt | 1 | 207 | 0 | | tui.txt | 23 | 22 | 51 | | uganda.txt | 62 | 2 | 96 | | ui.txt | 7 | 152 | 4 | | undo.txt | 72 | 19 | 79 | | userfunc.txt | 82 | 28 | 74 | | usr_01.txt | 16 | 4 | 80 | | usr_02.txt | 120 | 22 | 84 | | usr_03.txt | 139 | 9 | 93 | | usr_04.txt | 135 | 20 | 87 | | usr_05.txt | 108 | 19 | 85 | | usr_06.txt | 39 | 3 | 92 | | usr_07.txt | 104 | 8 | 92 | | usr_08.txt | 89 | 0 | 100 | | usr_09.txt | 67 | 1 | 98 | | usr_10.txt | 188 | 15 | 92 | | usr_11.txt | 60 | 16 | 78 | | usr_12.txt | 73 | 0 | 100 | | usr_20.txt | 96 | 0 | 100 | | usr_21.txt | 95 | 11 | 89 | | usr_22.txt | 64 | 6 | 91 | | usr_23.txt | 57 | 1 | 98 | | usr_24.txt | 124 | 6 | 95 | | usr_25.txt | 112 | 19 | 85 | | usr_26.txt | 54 | 3 | 94 | | usr_27.txt | 107 | 1 | 99 | | usr_28.txt | 87 | 0 | 100 | | usr_29.txt | 119 | 0 | 100 | | usr_30.txt | 143 | 10 | 93 | | usr_31.txt | 63 | 4 | 94 | | usr_32.txt | 31 | 0 | 100 | | usr_40.txt | 144 | 8 | 94 | | usr_41.txt | 369 | 58 | 86 | | usr_42.txt | 96 | 2 | 97 | | usr_43.txt | 43 | 8 | 84 | | usr_44.txt | 178 | 2 | 98 | | usr_45.txt | 123 | 1 | 99 | | usr_toc.txt | 0 | 0 | N/A | | various.txt | 64 | 51 | 55 | | vi_diff.txt | 87 | 13 | 87 | | vietnamese.txt | 4 | 8 | 33 | | vim_diff.txt | 13 | 89 | 12 | | vimeval.txt | 424 | 138 | 75 | | vimfn.txt | 969 | 498 | 66 | | visual.txt | 70 | 19 | 78 | | vvars.txt | 99 | 44 | 69 | | windows.txt | 233 | 103 | 69 | </details> Totals: - Double spaces: 12346 - Single spaces: 6927 - Percentage of double spaces: 64% So, yeah, there are slightly more double spaces than there are single spaces.
barrettruth commented 2026-03-25 19:33:41 +00:00

Yes. thanks for this commentary; busy with school but all of these standards need to be set.

I was very surprised by the "typewriter-esque" style spacing.

sigh... just complicates everything just a bit more 😭

Yes. thanks for this commentary; busy with school but all of these standards need to be set. I was very surprised by the "typewriter-esque" style spacing. sigh... just complicates everything just a bit more :sob:
barrettruth commented 2026-03-25 19:34:42 +00:00

thinking of making a report that canonicalizes all sorts of statistics and visualizes them like exacty what you just did - great stuff.

thinking of making a report that canonicalizes all sorts of statistics and visualizes them like exacty what you just did - great stuff.
eamonburns commented 2026-03-25 21:03:21 +00:00

Possibly interesting observation: It seems like the number of double spaces after a sentence is related to how old the help files are. For example, all the user manual files (usr_*.txt) have a high percentage of double spaces (lowest percentage is 78%, almost half of them are over 90%). But, I never studied statistics, and I could be just be misreading it 🤷‍♂️

Possibly interesting observation: It seems like the number of double spaces after a sentence is related to how old the help files are. For example, all the user manual files (`usr_*.txt`) have a high percentage of double spaces (lowest percentage is 78%, almost half of them are over 90%). But, I never studied statistics, and I could be just be misreading it 🤷‍♂️
eamonburns commented 2026-03-25 21:10:26 +00:00

There should probably be a difference between the specification, and a style guide/formatter.

Something that violates the spec would not be valid vimdoc. Something that violates the style guide would still be valid.

If you were to create a style guide, I doubt it would be adopted very widely in the plugin ecosystem, unless Neovim adopts it for the official help files. A proper style guide checker would help with that (i.e. vimdoc_ls) and so would actually opening an issue in the Neovim repo asking if the maintainers would like to start using a style guide, and if they would, what rules should be included.

There should probably be a difference between the specification, and a style guide/formatter. Something that violates the spec would not be valid vimdoc. Something that violates the style guide would still be valid. If you were to create a style guide, I doubt it would be adopted very widely in the plugin ecosystem, _unless_ Neovim adopts it for the official help files. A proper style guide checker would help with that (i.e. vimdoc_ls) and so would actually opening an issue in the Neovim repo asking if the maintainers would like to start using a style guide, and if they would, what rules should be included.
barrettruth commented 2026-03-25 21:29:24 +00:00

yep. planning to make a post @ ing people who have been participating in the conversation for exactly that. feel free to email me br@barrettruth.com or msg on any platform (username barrettruth) if you'd like to be involved for sure.

that's an important distinction. and, im only interested in vim/neovim core for sure - the community always falls in line with these things (eventually), especially if the tooling is good enough.

yep. planning to make a post @ ing people who have been participating in the conversation for exactly that. feel free to email me br@barrettruth.com or msg on any platform (username barrettruth) if you'd like to be involved for sure. that's an important distinction. and, im only interested in vim/neovim core for sure - the community always falls in line with these things (eventually), especially if the tooling is good enough.
barrettruth commented 2026-03-25 21:30:23 +00:00

also, so funny - i was intuiting that the typewriter-style spacing was indeed an old, ancient practice 💀 and it was

also, so funny - i was intuiting that the typewriter-style spacing was indeed an old, ancient practice :skull: and it was
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
barrettruth/vimdoc-language-server#61
No description provided.