Fixed conflicting documentation about DBSCAN #32882
Open
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
This pull request fixes conflicting information between the User Guide and the API regarding DBSCAN.
In the User Guide, the current wording implies that points exactly
epsdistance away from any core sample are considered outliers, but this is contradicted by the API, which refers toepsas "The maximum distance between two samples for one to be considered as in the neighborhood of the other." DBSCAN uses theradius_neighborsmethod in sklearn.neighbors.NearestNeighbors, which states in its API documentation, "Points lying on the boundary are included in the results," implying that points exactlyepsdistance away are not outliers.I edited the wording in the User Guide so that it agrees with the wording in both API pages.
AI usage disclosure
I used AI assistance for:
Any other comments?
It would be helpful for readers with a background in mathematics if the User Guide stated that neighborhoods used in the DBSCAN algorithm are closed, and that this contrasts with neighborhoods in the theory of topological spaces, which are open.
Example wording: "Neighborhoods in the DBSCAN algorithm are closed, unlike neighborhoods in mathematics which are open."
This is not included in my pull request because I believe this merits further discussion.