diff --git a/FAQ.md b/FAQ.md new file mode 100644 index 0000000..02c4206 --- /dev/null +++ b/FAQ.md @@ -0,0 +1,37 @@ +# Frequently Asked Questions - CodeQL :artificial_satellite: workshops + +## General +- **Will the slides be available?** + - Yes, [here](https://github.com/githubsatelliteworkshops/codeql/blob/master/satellite-2020-workshops-codeql.pdf) + +## CodeQL setup +- **I’m getting `could not resolve module java` and queries don’t seem to be running… did I miss something obvious in setting this up?** + - Make sure to get all sub-modules: `git clone --recursive https://github.com/github/vscode-codeql-starter/` + - You might have an old version of the `codeql cli` installed in your path. Delete that and let the vscode extension install it + +## CodeQL + +- **It is possible to create custom code ql queries that run as part of CI/CD?** + - Yes. For open-source projects you can configure the CodeQL GitHub Action to include custom queries you have added to your repository. For closed-source/enterprise code, you can do something similar once you have a license. The enterprise deployment of GitHub Advanced Security allows custom queries to be added and can be integrated into developer workflows. +- **Can CodeQL queries be run on the output of binary tools, such as LLVM or IDA, rather than on source code?** + - Usually no. CodeQL databases are produced by extracting the source code during the build process - the CLI listens to the compiler and processes all source code that is compiled and built. +- **Is there human readable documentation outside VSCode where one can browse the available API (methods, class hiearchy etc)?** + - The queries and standard libraries are open-sourced at http://github.com/codeql, and the documentation is available at https://help.semmle.com/QL/learn-ql/ and https://help.semmle.com/QL/ql-libraries.html. +- **Is there a repo/report/archive somewhere of you all running CodeQL and various vulnerabilities against a large number of open source repos already, or have you just run it on a few?** + - https://securitylab.github.com has comprehensive information on vulnerabilities discovered on OSS with CodeQL. + - https://LGTM.com runs CodeQL analysis for free on over 130k open-source repos. + - This scanning will now be enabled directly on the GitHub.com platform for our users, via the Code Scanning Action. +- **Is it possible to analyze the dependencies as dependencies? Namely to identify all code that has a specific dependency?** + - In general it is possible to identify dependencies. The exact mechanism depends on the language being analysed. For example, we have the content of Maven POM files in Java databases, and package.json in JavaScript databases, and you can query those to find out what your code depends on. Identifying specifically which code uses those dependencies is more involved, though I think mostly possible. From a security scanning point of view, there are some other complementary GitHub tools (dependency graph, dependency insights) that give you an overview of this information on your repositories. + - For the GitHub security feature: https://help.github.com/en/github/visualizing-repository-data-with-graphs/listing-the-packages-that-a-repository-depends-on + - For the CodeQL Java library that lets you examine POM files: https://help.semmle.com/qldoc/java/semmle/code/xml/MavenPom.qll/type.MavenPom$Dependency.html or https://github.com/github/codeql/blob/master/java/ql/src/semmle/code/xml/MavenPom.qll. +- **Preference between `getName() = string`/`hasName(string)` ?** + - Both are available for convenience. `hasName` with a specific string is shorter, but `getName` allows you to easily continue the condition, say, if you want to restrict the name with a regex like so: `.getName().regexpMatch(...)`. +- **What's the best way to extend a backend library to identify a new source of untrusted user input (in order to hopefully benefit from all the existing codeql queries)?** + - The main out-of-the-box definition of untrusted input in the java QL libraries is called `RemoteFlowSource` and it is defined in `FlowSources.qll`. This class allows you to extend it with more cases, by following the pattern in that file. + - Custom extensions can be conveniently put in the file `Customizations.qll` where they'll be visible by all queries. +- **CodeQL has few "common" concepts, but are all differently named. Makes the learning curve higher (for example Java `IfStmt/Block/getNumStmts` vs JavaScript `IfStmt/BasicBlock/getNumLines`). Wish there was a higher level of abstraction so that queries were a bit more portable** + - Actually, in this case JavaScript does have the same classes as Java (both `Block` and `getNumStmts()`). + - One challenge we have is where different languages have standard names for things that are different. For example, the Java language spec defines a "call" as a "method access", which is why the class name is `MethodAccess`. + - The other problem is that the concept may not be identical. In particular, a JavaScript `CallExpr` is quite different from a Java method call, because the target of the call can be defined dynamically. + diff --git a/README.md b/README.md index eecfb34..b7509b6 100644 --- a/README.md +++ b/README.md @@ -19,14 +19,14 @@ The following links contain the content that will be covered during the workshop ## :mega: Prerequisites - Install [Visual Studio Code](https://code.visualstudio.com/). -- Install the [CodeQL extension for Visual Studio Code](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html). +- Install the [CodeQL extension for Visual Studio Code](https://docs.github.com/en/code-security/codeql-for-vs-code/getting-started-with-codeql-for-vs-code/installing-codeql-for-vs-code). - You do _not_ need to install the CodeQL CLI: the extension will handle this for you. -- Set up the [CodeQL starter workspace](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html#using-the-starter-workspace). +- Set up the [CodeQL starter workspace](https://github.com/github/vscode-codeql-starter). - **Important:** Don't forget to use `git clone --recursive` or `git submodule update --init --remote` to update the submodules when you clone this repository. This allows you to obtain the standard CodeQL query libraries. - Open the starter workspace in Visual Studio Code: **File** > **Open Workspace** > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace` in your checkout of the starter workspace. - Download and add the CodeQL database to be used in the workshop: - - If you are attending **Finding security vulnerabilities in Java with CodeQL**, please download [this CodeQL database](https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-91ae344-CVE-2017-9805.zip). - - If you are attending **Finding security vulnerabilities in JavaScript with CodeQL**, please visit [this project page on LGTM.com](https://lgtm.com/projects/g/esbena/bootstrap-pre-27047/ci/#ql), create an account (you can log in via OAuth using a GitHub account), and click to download the latest database for JavaScript. + - If you are attending **Finding security vulnerabilities in Java with CodeQL**, please download [this CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/apache_struts_cve_2017_9805.zip). + - If you are attending **Finding security vulnerabilities in JavaScript with CodeQL**, please download [this CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/esbena_bootstrap-pre-27047_javascript.zip) - Unzip the database. - Import the unzipped database into Visual Studio Code: - Click the CodeQL icon in the left sidebar. @@ -34,9 +34,9 @@ The following links contain the content that will be covered during the workshop - Choose the unzipped database directory on your filesystem. ## :books: Resources -- [Learning CodeQL](https://help.semmle.com/QL/learn-ql) -- [Learning CodeQL for Java](https://help.semmle.com/QL/learn-ql/java/ql-for-java.html) -- [Learning CodeQL for JavaScript](https://help.semmle.com/QL/learn-ql/javascript/ql-for-javascript.html) -- [Using the CodeQL extension for VS Code](https://help.semmle.com/codeql/codeql-for-vscode.html) -- More about CodeQL on [GitHub Security Lab](https://securitylab.github.com/tools/codeql) +- [CodeQL docs](https://codeql.github.com/docs/) +- [CodeQL for Java](https://codeql.github.com/docs/codeql-language-guides/codeql-for-java/) +- [CodeQL for JavaScript](https://codeql.github.com/docs/codeql-language-guides/codeql-for-javascript/) +- [CodeQL for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/) +- More about CodeQL on [GitHub Security Lab](https://securitylab.github.com/get-involved/) - CodeQL on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)) diff --git a/java.md b/java.md index 602758d..1529f02 100644 --- a/java.md +++ b/java.md @@ -44,7 +44,7 @@ To take part in the workshop you will need to follow these steps to get the Code 3. [Set up the starter workspace](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html#using-the-starter-workspace). - ****Important****: Don't forget to `git clone --recursive` or `git submodule update --init --remote`, so that you obtain the standard query libraries. 4. Open the starter workspace: File > Open Workspace > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace`. -5. Download and unzip the [apache-struts-91ae344-CVE-2017-9805 database](https://downloads.lgtm.com/snapshots/java/apache/struts/apache-struts-91ae344-CVE-2017-9805.zip). +5. Download and unzip the [apache_struts_cve_2017_9805.zip database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/apache_struts_cve_2017_9805.zip). 6. Choose this database in CodeQL (using `Ctrl + Shift + P` to open the command palette, then selecting "CodeQL: Choose Database"). 7. Create a new file in the `codeql-custom-queries-java` directory called `UnsafeDeserialization.ql`. @@ -405,6 +405,13 @@ The answer to this is to convert the query to a _path problem_ query. There are import semmle.code.java.dataflow.DataFlow import DataFlow::PathGraph + predicate isXMLDeserialized(Expr arg) { + exists(MethodAccess fromXML | + fromXML.getMethod().getName() = "fromXML" and + arg = fromXML.getArgument(0) + ) + } + /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */ class ContentTypeHandler extends RefType { ContentTypeHandler() { @@ -445,8 +452,10 @@ For more information on how the vulnerability was identified, you can read the [ Although we have created a query from scratch to find this problem, it can also be found with one of our default security queries, [UnsafeDeserialization.ql](https://github.com/github/codeql/blob/master/java/ql/src/Security/CWE/CWE-502/UnsafeDeserialization.ql). You can see this on a [vulnerable copy of Apache Struts](https://github.com/m-y-mo/struts_9805) that has been [analyzed on LGTM.com](https://lgtm.com/projects/g/m-y-mo/struts_9805/snapshot/31a8d6be58033679a83402b022bb89dad6c6e330/files/plugins/rest/src/main/java/org/apache/struts2/rest/handler/XStreamHandler.java?sort=name&dir=ASC&mode=heatmap#x121788d71061ed86:1), our free open source analysis platform. -## Follow up material - - - [Tutorial: Analyzing data flow in Java](https://help.semmle.com/QL/learn-ql/java/dataflow.html) - - [CodeQL training for Java](https://help.semmle.com/QL/learn-ql/ql-training.html#codeql-and-variant-analysis-for-java) - - [GitHub Security Lab research blog](https://securitylab.github.com/research) \ No newline at end of file +## What's next? +- Read the [tutorial on analyzing data flow in Java](https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-java/#analyzing-data-flow-in-java). +- Go through more [CodeQL training materials for Java](https://codeql.github.com/docs/codeql-language-guides/codeql-for-java/). +- Try out the latest CodeQL Java Capture-the-Flag challenge on the [GitHub Security Lab website](https://securitylab.github.com/ctf) for a chance to win a prize! Or try one of the older Capture-the-Flag challenges to improve your CodeQL skills. +- Try out a CodeQL course on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)). +- Read about more vulnerabilities found using CodeQL on the [GitHub Security Lab research blog](https://securitylab.github.com/research). +- Explore the [open-source CodeQL queries and libraries](https://github.com/github/codeql), and [learn how to contribute a new query](https://github.com/github/codeql/blob/master/CONTRIBUTING.md). diff --git a/javascript.md b/javascript.md index b83839d..f77fd3e 100644 --- a/javascript.md +++ b/javascript.md @@ -45,9 +45,7 @@ To take part in the workshop you will need to follow these steps to get the Code 1. [Set up the starter workspace](https://help.semmle.com/codeql/codeql-for-vscode/procedures/setting-up.html#using-the-starter-workspace). - **Important**: Don't forget to `git clone --recursive` or `git submodule update --init --remote`, so that you obtain the standard query libraries. 1. Open the starter workspace: File > Open Workspace > Browse to `vscode-codeql-starter/vscode-codeql-starter.code-workspace`. -1. Create an account on LGTM.com if you haven't already. You can log in via OAuth using your Google or GitHub account. -1. Visit the [database downloads page for the vulnerable version of Bootstrap on LGTM.com](https://lgtm.com/projects/g/esbena/bootstrap-pre-27047/ci/#ql). -1. Download the latest database for JavaScript. +1. Download the [esbena_bootstrap-pre-27047_javascript CodeQL database](https://github.com/githubsatelliteworkshops/codeql/releases/download/v1.0/esbena_bootstrap-pre-27047_javascript.zip). 1. Unzip the database. 1. Import the unzipped database into Visual Studio Code: - Click the **CodeQL** icon in the left sidebar. @@ -80,7 +78,7 @@ Each step has a **Solution** that indicates one possible answer. Note that all q
Solution - ``` + ```ql from CallExpr dollarCall select dollarCall ``` @@ -98,7 +96,7 @@ Each step has a **Solution** that indicates one possible answer. Note that all q
Solution - ``` + ```ql from CallExpr dollarCall, Expr dollarArg where dollarArg = dollarCall.getArgument(0) select dollarArg @@ -116,7 +114,7 @@ Each step has a **Solution** that indicates one possible answer. Note that all q
Solution - ``` + ```ql from CallExpr dollarCall, Expr dollarArg where dollarArg = dollarCall.getArgument(0) and @@ -134,11 +132,11 @@ Each step has a **Solution** that indicates one possible answer. Note that all q - Calling the predicate `jquery()` returns all values that refer to the `$` function. - To find all calls to this function, use the predicate `getACall()`. - Notice that when you call `jquery()`, `getACall()`, and `getAnArgument()` in succession, you get return values of type `DataFlow::Node`, not `Expr`. These are **data flow nodes**. They describe a part of the source program that may have a value, and let us do more complex reasoning about this value. We'll learn more about these in the next section. - - You can convert the data flow node back into an `Expr` using the predicate `asExpr()`. + - You can change your `dollarArg` variable to have type `DataFlow::Node`, or convert the data flow node back into an `Expr` using the predicate `asExpr()`.
Solution - ``` + ```ql from Expr dollarArg where dollarArg = jquery().getACall().getArgument(0).asExpr() @@ -147,7 +145,7 @@ Each step has a **Solution** that indicates one possible answer. Note that all q OR - ``` + ```ql from DataFlow::Node dollarArg where dollarArg = jquery().getACall().getArgument(0) @@ -170,14 +168,14 @@ Consider creating a new query for these next few steps, or commenting out your e 1. You have already seen how to find references to the jQuery `$` function. Now find all places in the code that read the property `$.fn`.
Hint - + - Declare a new variable of type `DataFlow::Node` to hold the results. - Notice that `jQuery()` returns a value of type `DataFlow::SourceNode`. Source nodes are places in the program that introduce a new value, from which the flow of data may be tracked. - `DataFlow::SourceNode` has a predicate named `getAPropertyRead(string)`, which finds all reads of a particular property on the same object. The string argument is the name of the property.
Solution - ``` + ```ql from DataFlow::Node n where n = jquery().getAPropertyRead("fn") select n @@ -226,8 +224,8 @@ Consider creating a new query for these next few steps, or commenting out your e
Solution - ``` - from DataFlow::FunctionNode plugin + ```ql + from DataFlow::Node plugin where plugin = jquery().getAPropertyRead("fn").getAPropertySource() select plugin ``` @@ -246,7 +244,7 @@ Consider creating a new query for these next few steps, or commenting out your e
Solution - ``` + ```ql from DataFlow::FunctionNode plugin, DataFlow::ParameterNode optionsParam where plugin = jquery().getAPropertyRead("fn").getAPropertySource() and @@ -280,9 +278,7 @@ class Config extends TaintTracking::Configuration { ) } override predicate isSink(DataFlow::Node sink) { - exists(/** TODO fill me in **/ | - sink = /** TODO fill me in from Section 1 **/ - ) + sink = /** TODO fill me in from Section 1 **/ } } @@ -321,7 +317,8 @@ select sink, source, sink, "Potential XSS vulnerability in plugin." Hint - Complete the same process as above. - - Remember that the argument of a call to `$` is a sink for XSS vulnerabilities. + - We already found a `DataFlow::Node` in Section 1 as the result of calling `jquery()` and predicates on it. + - Remember that the first argument of a call to `$` is a sink for XSS vulnerabilities.
@@ -369,10 +366,16 @@ select sink, source, sink, "Potential XSS vulnerability in plugin." select sink, source, sink, "Potential XSS vulnerability in plugin." ```
+ +We have created a query from scratch to find this problem. A production version of this query can be found as part of the default set of CodeQL security queries: [UnsafeJQueryPlugin.ql](https://github.com/github/codeql/blob/master/javascript/ql/src/Security/CWE-079/UnsafeJQueryPlugin.ql). You can [see the results on a vulnerable copy of Bootstrap](https://lgtm.com/projects/g/esbena/bootstrap-pre-27047?mode=tree&ruleFocus=1511421786841) that has been analyzed on LGTM.com, our free open source analysis platform. -## Follow-up material -- [Tutorial: Analyzing data flow in JavaScript and TypeScript](https://help.semmle.com/QL/learn-ql/javascript/dataflow.html) +## What's next? +- Read the [tutorial on analyzing data flow in JavaScript and TypeScript](https://help.semmle.com/QL/learn-ql/javascript/dataflow.html). +- Try out the latest CodeQL Capture-the-Flag challenge on the [GitHub Security Lab website](https://securitylab.github.com/ctf) for a chance to win a prize! Or try one of the older Capture-the-Flag challenges to improve your CodeQL skills. +- Try out a CodeQL course on [GitHub Learning Lab](https://lab.github.com/githubtraining/codeql-u-boot-challenge-(cc++)). +- Read about more vulnerabilities found using CodeQL on the [GitHub Security Lab research blog](https://securitylab.github.com/research). +- Explore the [open-source CodeQL queries and libraries](https://github.com/github/codeql), and [learn how to contribute a new query](https://github.com/github/codeql/blob/master/CONTRIBUTING.md). ## Acknowledgements -This is a reduced version of a Capture-the-Flag challenge devised by @esbena, available at https://securitylab.github.com/ctf/jquery. Try out the full version! +This is a reduced version of a Capture-the-Flag challenge devised by @esbena, available at https://securitylab.github.com/ctf/jquery. Try out the full version! Thanks to our moderators for valuable feedback on the workshop. diff --git a/satellite-2020-workshops-codeql.pdf b/satellite-2020-workshops-codeql.pdf new file mode 100644 index 0000000..073c2a9 Binary files /dev/null and b/satellite-2020-workshops-codeql.pdf differ