Skip to content

Conversation

@knguyen1
Copy link

@knguyen1 knguyen1 commented Oct 25, 2025

This PR closes: #3837

What does this PR do?

Implements a modern spellcheck plugin that provides "Try searching for:" suggestions for misspelled search queries. The plugin supports two providers:

  • pyspellchecker: Local spell checking (privacy-friendly)
  • google: Google Autocomplete API (default, requires internet)

The plugin follows modern SearXNG plugin architecture with SOLID principles, protocol-based design, and comprehensive unit testing.

Why is this change important?

  • SearXNG currently lacks built-in spell checking functionality for search queries
  • Users often misspell search terms without getting helpful corrections
  • Provides a configurable solution that respects user privacy preferences
  • Supports both local (privacy-friendly) and cloud-based spell checking options

How to test this PR locally?

  1. Enable the plugin in :

    searx.plugins.spellcheck.SXNGPlugin:
      active: true
      parameters:
        provider: pyspellchecker  # or 'google'
  2. Install dependencies:

    uv pip install -r ./requirements.txt
  3. Test with misspelled queries:

    • Try "vaccum cleaner" (should suggest "vacuum cleaner")
    • If using google provider; it supports langauges pyspellchecker doesn't support:
      • (Vietnamese) Try "lich phat song vtv3 hom nay" (should suggest "lịch phát sóng vtv3 hôm nay")
      • (Czech) Try "pocasi praha zitra" (should suggest "počasí Praha zítra")
  4. Run unit tests:

    uv run nose2 -v tests.unit.test_spellcheck_plugin

Related issues

Screenshots

image image

Introduces a new spellcheck plugin that suggests corrections for user queries. The plugin can utilize either the `pyspellchecker` library or Google Autocomplete for suggestions, configurable via settings. Updates to `requirements.txt` include the addition of `pyspellchecker`. Unit tests for the new plugin functionality have also been added.
@return42
Copy link
Member

The plugin follows modern SearXNG plugin architecture with SOLID principles, protocol-based design, and comprehensive unit testing.

I've only had a quick look at it so far, but I can already say that it's great work. Thank you very much 👍

Within the review of PR #3837, one important question has not yet been answered: we're not sure about the memory footprint of using pyspellchecker

- Fix Black formatting issues (dictionary comprehensions, function signatures)
- Move imports to module level to resolve pylint C0415 warnings
- Replace Protocol ellipsis with NotImplementedError for better type safety
- Fix test mocking paths after import reorganization
- Add module docstring to test file
@knguyen1
Copy link
Author

knguyen1 commented Oct 26, 2025

Consider adding .pre-commit-config.yaml to run these lints/CI cmds. I had no idea about them; my usual workflow is ruff and uv. Also I don't quite agree with a lot of these lint rules, e.g.

EDIT: I see the pr #3761 (comment) and the issue #3760 Hard disagree on the final decision. But okay.

image

Something like this...

fail_fast: true
default_install_hook_types: [pre-commit, pre-push]

repos:
  # General hooks
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-json
      - id: check-merge-conflict
      - id: check-added-large-files
        args: ['--maxkb=5000']
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: mixed-line-ending
        args: ['--fix=lf']
  - repo: local
    hooks:
      - id: format-python
        name: format Python code
        entry: ./manage format.python
        language: system
        types: [python]
        pass_filenames: false
        always_run: true
      - id: test-pylint
        name: run pylint
        entry: ./manage test.pylint
        language: system
        types: [python]
        pass_filenames: false
        always_run: true
  - repo: local
    hooks:
      - id: nose2
        name: run nose2 tests
        entry: uv run nose2 -F tests.unit
        # tests donn't work with mp plugin
        # entry: uv run --frozen nose2 -qq -F --plugin=nose2.plugins.mp --processes=$(nproc) tests.unit
        language: python
        types: [python]
        pass_filenames: false
        always_run: true
        stages: [pre-push]  # takes a long time, only run on pre-push

- Extract correction logic into _get_correction helper method
- Reduce return statements and improve code organization
- iIprove spellcheck plugin query correction logic
@inetol
Copy link
Member

inetol commented Oct 26, 2025

Amazing work! I've tested it with very good results.

we're not sure about the memory footprint of using pyspellchecker

It adds at most (based on my "trust me bro" tests) 20 MB additional memory usage with pyspellchecker enabled.

The issue here is that this library is CPU intensive, should be caped to 80 characters per search as it overloads the synchronous nature of WSGI, making the experience with concurrent requests very poor.

@return42 return42 self-requested a review October 26, 2025 14:55
@return42
Copy link
Member

The issue here is that this library is CPU intensive,

Memory and/or CPU intensive tasks -- resource intensive tasks -- should not be performed by the SearXNG server. SearXNG is an aggregator and such tasks should be provided by other services to integrate them into SearXNG.

For pyspellchecker, it would be possible (not part of SearXNG) to implement a (local) service that is integrated into SearXNG via HTTP (like other engines, answerer or plugin).

@inetol
Copy link
Member

inetol commented Oct 31, 2025

For pyspellchecker, it would be possible (not part of SearXNG) to implement a (local) service that is integrated into SearXNG via HTTP (like other engines, answerer or plugin).

Working on a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants