-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
[DomCrawler] Use the native HTML5 parser on PHP 8.4 #61475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
90067d6 to
12d3f8c
Compare
f963618 to
a6a0033
Compare
a6a0033 to
bd7fb51
Compare
…nks to the native DOM parser (nicolas-grekas) This PR was merged into the 8.0 branch. Discussion ---------- [DomCrawler] Always parse according to HTML5 rules thanks to the native DOM parser | Q | A | ------------- | --- | Branch? | 8.0 | Bug fix? | no | New feature? | yes | Deprecations? | no | Issues | - | License | MIT Follows #61475 Commits ------- 0425b2a [DomCrawler] Always parse according to HTML5 rules thanks to the native DOM parser
|
Just a note/heads up for others that find themselves here... I upgraded to v7.4.0-BETA1 - ran my unit tests, which worked in 7.3.x - and now they fail :) Messages like: A reproducer is here https://github.com/PhilETaylor/symfony-7.4-alpine-reproducer I think this is a |
|
the fact that Alpine.js makes you write code that reports warnings in spec-compliant HTML parsers looks like an issue for alpine.js instead. However, it looks like the new code path does not use |
Totally agree. Just mentioning it as Im probably the first to see this since upgrading/testing the beta.
I only mentioned it as the OP mentioned this should remove b/c breaks, and then my tests broke up upgrade. Im not expecting a fix, unless you think one is needed. |
|
I suggest opening an issue instead of discussing that on a merged PR, to give it more visibility. |
This PR keeps the
DOM*-based API but uses the native HTML5 parser on PHP 8.4 instead of masterminds/html5.This works by parsing HTML strings using
Dom\HTMLDocumentthen serializing to XML, and loading again usingDOMDocument::loadXML().This basically replaces #61356 since it removes any BC breaks.
The drawback compared to a more native approach is the double-parsing that happens.
This could be worked on later by providing a way to leverage the new
Dom\*API directly.To be proved worth it before.