-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Problem
Supported languages such as JSP (and unsupported languages such as PHP, VisualForce and Lightning) allow to embed snippets of code in supported languages such as JavaScript (and unsupported languages such as CSS).
However, the parser and rules applied per file are determined by file extension. That means, that given a project using a ruleset with both JS rules and JSP rules, inconsistent rule application can occur:
File test.jsp, jsp rules applied:
<script href="/test.js"></script> <!-- test.js will be parsed and analyzed by JS rules -->
<script> <!-- This snippet may contain whatever, no JS rules will ever check it -->
var test = "something";
</script>Objective
Embedded code snippets should be parsed and analyzed the same way stand-alone files are.
Proposal
Each language parser should, upon detecting an embeded snippet, make sure to call a proper PMD helper with the snippet contents & detected language to use. Such helper will:
- Make sure the detected language is supported, showing a warning otherwise
- Synchronously start a new PMD analysis using the same ruleset on the contents of the snippet. This way the same rules are applied on the snippet that are applied to standalone files. The source file and line numbers should not be altered. It may be of use to allow the
Contextto know it's an embedded snippet. - Resume parsing of the holder file as is.
By having this logic in the parser, it's enabled by default for all supported languages, and guaranteed to occur only once per file.
Difficulties
The holder languages tend to allow to mix the embedded snippets with expressions in the original language, such as:
<script>
var s = "<c:out val="${name}"/>";
</script>These should probably be analyzed from the perspective of both languages.
Regarding the holder language, these expressions are of interest to be analyzed on their own (ie: don't allow unescaped values). They should probably be transformed into child nodes of the <script> tag in the AST which rules can visit and report on.
Regarding to the snippet, these expressions should be either removed or replaced with literals that won't change the syntax. This way, the embedded language will have no idea those values came from the holder language. This could be challenging, since the developer could use them anywhere, and we don't want these placeholders introducing false positives in the embedded language ruleset. For instance:
var s = "${stringVar}";
var i = ${intVar};
${methodNameSentFromController}();
${anotherVar}[4] = "test me";What value could be passed as placeholder for all 4 alternatives? From a parsers' perspective, the only alternative I can think of valid in all 4 contexts is (function() {}). Where the last alternative would produce a runtime error, but the parser, and available rules will probably never notice.
These snippets are specific to each embedded language. This however means:
- The embedded language opts-in to being embedded, providing a placeholder factory.
- The holder language opts-in to parsing and analyzing embedded snippets by handling them properly in their parsers.
Also of note, since the embedded language has no idea that those are placeholders coming from a host language (as to not modify the parsers for the embedded languages), this means we can go as far as enforcing:
<script>
var userData = '{! JSENCODE(userData)}';
</script>but not actually enforcing:
<script>
var userData = JSON.parse('{! JSENCODE(userData)}');
</script>I'm not really happy with this, but can't think of a better way around it short of having he embedded language also know of the holder language and this means making changes to the parsers that will be hard to mantain and extend in the future.