Cspell: Check Typos Only, Ignore Unknown Words
Let's dive into configuring cspell to focus solely on typo detection while ignoring unknown words. This can significantly reduce noise in your spell-checking workflow, especially when dealing with codebases that include domain-specific terminology or uncommon identifiers. The user, @gaby, is facing a common problem: an overwhelming number of "unknown word" errors, making it difficult to identify actual typos. They want to streamline their cspell configuration to only flag genuine misspellings. So, if you're in the same boat, keep reading!
Understanding the Problem
The core issue is that cspell, by default, flags any word not found in its dictionaries as an "unknown word." While this can be helpful in some contexts, it becomes a hindrance when dealing with code, documentation, or any text containing specialized terms. The goal is to tell cspell to be less strict and only report words that are likely to be misspelled, based on common typo patterns.
Analyzing the Configuration
Before we get to the solution, let's break down the existing configuration provided by @gaby. This will help us understand where adjustments need to be made.
Workflow (.github/workflows/spellcheck.yml)
The workflow is set up to run on pull requests and pushes to the main branch. It uses the streetsidesoftware/cspell-action@v7 action to perform spell checking. Here's a snippet of the workflow configuration:
name: Spell check
on:
pull_request:
types:
- opened
- synchronize
- reopened
- ready_for_review
push:
branches:
- main
permissions:
contents: read
pull-requests: read
jobs:
cspell:
name: cspell
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "20.x"
- name: Install cspell dictionaries
run: |
npm install --no-save \
@cspell/dict-en_us \
@cspell/dict-en-gb \
@cspell/dict-software-terms \
@cspell/dict-golang \
@cspell/dict-fullstack \
@cspell/dict-docker \
@cspell/dict-k8s \
@cspell/dict-node \
@cspell/dict-npm \
@cspell/dict-typescript \
@cspell/dict-html \
@cspell/dict-css \
@cspell/dict-shell \
@cspell/dict-python \
@cspell/dict-redis \
@cspell/dict-sql \
@cspell/dict-filetypes \
@cspell/dict-companies \
@cspell/dict-markdown \
@cspell/dict-en-common-misspellings \
@cspell/dict-people-names \
@cspell/dict-data-science
- name: Run cspell
uses: streetsidesoftware/cspell-action@v7
with:
incremental_files_only: false
check_dot_files: explicit
The workflow correctly installs a wide range of cspell dictionaries, which is a good start. However, the issue lies in how cspell is configured to handle unknown words.
cspell Configuration (cspell.json)
The cspell.json file is where the behavior of cspell is defined. Here's the relevant part of the configuration:
{
"version": "0.2",
"language": "en, en-gb, en-us",
"useGitignore": true,
"caseSesnsitive": false,
"unknownWords": "report-common-typos",
"import": [
"@cspell/dict-en_us/cspell-ext.json",
"@cspell/dict-en-gb/cspell-ext.json",
"@cspell/dict-software-terms/cspell-ext.json",
"@cspell/dict-golang/cspell-ext.json",
"@cspell/dict-fullstack/cspell-ext.json",
"@cspell/dict-docker/cspell-ext.json",
"@cspell/dict-k8s/cspell-ext.json",
"@cspell/dict-node/cspell-ext.json",
"@cspell/dict-npm/cspell-ext.json",
"@cspell/dict-typescript/cspell-ext.json",
"@cspell/dict-html/cspell-ext.json",
"@cspell/dict-css/cspell-ext.json",
"@cspell/dict-shell/cspell-ext.json",
"@cspell/dict-python/cspell-ext.json",
"@cspell/dict-redis/cspell-ext.json",
"@cspell/dict-sql/cspell-ext.json",
"@cspell/dict-filetypes/cspell-ext.json",
"@cspell/dict-companies/cspell-ext.json",
"@cspell/dict-markdown/cspell-ext.json",
"@cspell/dict-en-common-misspellings/cspell-ext.json",
"@cspell/dict-people-names/cspell-ext.json"
],
"dictionaries": [
"en_us",
"en-gb",
"softwareTerms",
"web-services",
"networking-terms",
"software-term-suggestions",
"software-services",
"software-terms",
"software-tools",
"coding-compound-terms",
"golang",
"fullstack",
"docker",
"k8s",
"node",
"npm",
"typescript",
"html",
"css",
"shell",
"python",
"redis",
"sql",
"filetypes",
"companies",
"markdown",
"en-common-misspellings",
"people-names",
"data-science",
"data-science-models",
"data-science-tools"
],
"ignorePaths": [
".git",
"node_modules",
"vendor",
"internal",
".github",
"**/*.svg",
"**/*.png",
"**/*.jpg",
"**/*.jpeg",
"**/*.gif",
"**/*.ico",
"**/*.lock",
"**/*_gen.go",
"**/*_msgp.go",
"**/*_msgp_test.go",
"**/*_test.go",
"go.mod",
"go.sum",
".golangci.yml",
".markdownlint.yml",
"AGENTS.md"
]
}
The key line here is: "unknownWords": "report-common-typos". This setting tells cspell to report only common typos for unknown words. This is the correct setting to reduce the noise from unknown words while still catching likely misspellings. If you are still getting too many unknown words, it may be necessary to ignore all unknown words.
Solution: Configuring cspell to Ignore All Unknown Words
To achieve the desired behavior of only checking for typos and ignoring all unknown words, you need to adjust the unknownWords setting in your cspell.json file. Here's how:
-
Set
unknownWordsto"ignore":Modify your
cspell.jsonfile to include the following:{ "version": "0.2", "language": "en, en-gb, en-us", "useGitignore": true, "caseSesnsitive": false, "unknownWords": "ignore", ... }This tells
cspellto completely ignore any word not found in its dictionaries. -
(Optional) Fine-tune Dictionaries:
While ignoring unknown words, ensure that the dictionaries you've included cover the majority of the correct words in your codebase. You can add or remove dictionaries in the
dictionariesarray of yourcspell.jsonfile.{ ... "dictionaries": [ "en_us", "en-gb", "softwareTerms", ... ], ... } -
(Optional) Add a
wordssection:If you have a set of project-specific words that you want
cspellto always accept, you can add awordssection to yourcspell.json:{ ... "words": [ "mycustomword", "anothercustomword" ], ... }
Applying the Solution
-
Modify
cspell.json:Update your
cspell.jsonfile with the changes mentioned above. Ensure that theunknownWordssetting is set to"ignore". -
Commit and Push:
Commit the changes to your
cspell.jsonfile and push them to your repository. -
Trigger the Workflow:
The
cspellworkflow will automatically run on the next pull request or push to themainbranch. Review the output to ensure that only genuine typos are being reported.
Additional Tips
- Custom Dictionaries: Consider creating custom dictionaries for your project if you have a large number of domain-specific terms. This can be done by creating a
.txtfile with a list of words and referencing it in thedictionariesarray of yourcspell.jsonfile. You would add the path to the text file in thedictionariesarray. - Exclusion Rules: Use the
ignorePathsarray to exclude files or directories that you don't wantcspellto check. This can be useful for generated code, vendor directories, or other areas where spell checking is not relevant. - Incremental Checking: The
incremental_files_only: truesetting in the workflow configuration can speed up spell checking by only checking files that have been modified since the last commit. However, for the initial setup and after major changes, it's best to run a full check withincremental_files_only: false. - Verbose Logging: Although @gaby mentioned that verbose logging didn't provide much information, it's still worth experimenting with the
-vor--verboseflag when runningcspellfrom the command line to diagnose issues. However, thecspell-actionmay not expose the verbose flag.
By following these steps, you can effectively configure cspell to focus on typo detection and ignore unknown words, making your spell-checking workflow more efficient and less noisy. Remember to fine-tune the configuration to suit the specific needs of your project.