Skip to content

Conversation

@Tim123467
Copy link

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This pull request fixes conflicting information between the User Guide and the API regarding DBSCAN.

In the User Guide, the current wording implies that points exactly eps distance away from any core sample are considered outliers, but this is contradicted by the API, which refers to eps as "The maximum distance between two samples for one to be considered as in the neighborhood of the other." DBSCAN uses the radius_neighbors method in sklearn.neighbors.NearestNeighbors, which states in its API documentation, "Points lying on the boundary are included in the results," implying that points exactly eps distance away are not outliers.

I edited the wording in the User Guide so that it agrees with the wording in both API pages.

AI usage disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Any other comments?

It would be helpful for readers with a background in mathematics if the User Guide stated that neighborhoods used in the DBSCAN algorithm are closed, and that this contrasts with neighborhoods in the theory of topological spaces, which are open.

Example wording: "Neighborhoods in the DBSCAN algorithm are closed, unlike neighborhoods in mathematics which are open."

This is not included in my pull request because I believe this merits further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant