Skip to main content

Control Flow Toolkit

Project description

Combined CI status Supported Python Versions

Introduction

This is a Toolkit for getting control flow information from Python bytecode.

Specifically:

  • Creates basic blocks from Python bytecode.

  • Creates a control-flow graph from the basic blocks.

  • Creates dominator trees and dominator regions for the control flow.

  • Graphs via dot the control-flow graph and dominator tree.

I’ve used some routines from Romain Gaucher’s equip as a starting point.

This code is alpha. There may be some bugs in the code. And, right now, we always produce dot graphs, which can be a problem with large bytecode. Inserting pseudo bytecode instructions was designed to be used with the newer grammar-based Python decompiler project. That code may change a bit.

Example

For now, the Python in test/test_bb2.py shows what’s up the best.

Consider this simple Python program taken from my BlackHat Asia 2024 talk:

# Program to count the number of bits in the integer 6.
i: int = 6
zero_bits = 0
one_bits = 0
while i > 0:  # loop point
   # loop alternative
   if i % 0:
       # first alternative
       one_bits += 1
   else:
       # second alternative
       zero_bits += 1
   # join point
   i << 1
# loop-end join point

You can find this byte-compiled to Python 3.8 bytecode in doc-example/count-bits.cpython-38.pyc. We can get control flow information for this program using:

python ./test/test-bb2.py doc-example/count-bits.cpython-38.pyc

After running, in /tmp you’ll find some .dot files and some .png images generated for the main routine.

flow-3.8--count-bits.cpython-38--module.png is a PNG image for the control flow.

https://github.com/rocky/python-control-flow/blob/master/doc-example/flow-3.8-count-bits.cpython-38--module.png

Here is what the colors on the arrows indicate:

red

The first alternative of a group of two alternatives

blue

The second alternative of a group of two alternatives

green

A looping (backward) jump

Here is what the line styles on the arrows indicate:

solid

an unconditional (and forward) jump

dashed

This should always be shown as a straight line centered from one block on top to the next block below it. It is the block that follows in the bytecode sequentially. If there is an arrowhead, there is a fall-through path from the upper block to the lower block. If there is no arrowhead, then either the last instruction of the upper basic block is an unconditional jump or this instruction is a return instruction or an explicit exception-raising instruction.

dotted

The jump path of a conditional jump. This is usually curved and appears on the side of a box.

We align blocks linearly using the offset addresses. You can find the offset ranges listed inside the block. The entry block is marked with an additional border. We also show the basic block number and block flags.

Any block that is ghost-like or has a white-background box with a dashed border is dead code.

Control-Flow with Dominator Regions

In addition to the basic control flow, we mark and color boxes with dominator regions.

https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.8-count-bits.cpython-38--module.png

Regions with the same nesting level have the same color. So Basic blocks 3 and 7 are at the same nesting level. Blocks 4 and 5 are at the same nesting level and color.

Block 6 has two jumps into the block, so it is neither “inside” nor blocks 4 or 5. Block 6 is the “join point” block after an if/else:

# block 3
if i % 0:
    # block 4
    one_bits += 1
else:
    # block 5
    zero_bits += 1
# join point
i << 1  # This is block 6

The collection of blocks, 4, 5, and 6, are all dominated by the block region head Block 3, which has a border around it to show it is the head of a block region.

A border is put around a block only if it dominates some other block. So while technically block 4 dominates itself, and block 5 dominates itself, that fact is not interesting.

Colors get darker as the region is more nested.

In addition, if a jump or fallthrough jumps out of its dominator region, the arrowhead of the jump is shown in brown. Note that a jump arrow from an “if”-like statement or “for”-like to its end will not be in brown. Only the “fallthrough” arrow will be in brown. This is why the arrowhead of the jump from block to block 7 is blue, not brown.

If any basic block is jumped to using a jump-out (or end scope) kind of edge, then the box has a brown outline.

Inside the block text, we now add the dominator region number for a block in parentheses. For example, Basic blocks 4 and 5 are in dominator region 3, and so are marked “(3)” after their basic block number. The dominator number for a basic block is the same as its basic block number. So Basic Block 3 is also Dominator Region 3.

Note that even though basic blocks 4 and 5 are at the same indentation level, they are in different scopes under basic block 3.

In this example, all conditional jumps were taken if the condition was false. When the condition is true, we bold the dotted blue arrow. By doing this and by showing whether the jump condition is true or false, you can see in the control flow whether the source text contains an “and” type of condition or an “or” type of condition.

Here is the graph for x and y:

https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.9-and-lambda%3Ax-y.png

Note the same graph would be the same as if x: if y: ...`.

The graph for a or b is almost the same except the style of the blue dotted arrow:

https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.9-or-lambda%3Aa-b.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_control_flow-1.0.0.tar.gz (37.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

python_control_flow-1.0.0-py313-none-any.whl (45.0 kB view details)

Uploaded Python 3.13

python_control_flow-1.0.0-py312-none-any.whl (45.0 kB view details)

Uploaded Python 3.12

python_control_flow-1.0.0-py311-none-any.whl (45.0 kB view details)

Uploaded Python 3.11

python_control_flow-1.0.0-py310-none-any.whl (44.9 kB view details)

Uploaded Python 3.10

python_control_flow-1.0.0-py39-none-any.whl (44.9 kB view details)

Uploaded Python 3.9

python_control_flow-1.0.0-py38-none-any.whl (44.9 kB view details)

Uploaded Python 3.8

File details

Details for the file python_control_flow-1.0.0.tar.gz.

File metadata

  • Download URL: python_control_flow-1.0.0.tar.gz
  • Upload date:
  • Size: 37.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for python_control_flow-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d0744f907f124565eaae02c38562f453a2b8c4e6af5cb3154db63abbda5c9e1a
MD5 be58ff76696b51e5b7c63ca5cb6046f5
BLAKE2b-256 a891c530df097c1b540137b82cba2d3f07f76786dcb48e745da7aad3c2f8e4a1

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py313-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py313-none-any.whl
Algorithm Hash digest
SHA256 7556ac7a55606af6c688b23ec494c46155e627209343b9c1e68d2e9b9290adb9
MD5 2f262ce056eb0224c70a3f30b89c0442
BLAKE2b-256 ff3608f7ece23de06967f7b54d4eb57689d033c5287b34c8dfd9cf1e5ec39b5c

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py312-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py312-none-any.whl
Algorithm Hash digest
SHA256 53839f6fc92560ddb32ba7551231a8e48b9221a8e735befff9ca132569a8f89a
MD5 1d04eacb573d43e1cc8bce40195ee56d
BLAKE2b-256 41c768762d367c5fe685c48a786d7c52dda14efe9cff3ce79fc76e66d5c1b20d

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py311-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py311-none-any.whl
Algorithm Hash digest
SHA256 67a5ed0b7e59c6c3a6ccb7b54d2285f53067ea626ccd95450548cf8907119f95
MD5 c056eed30d5203f2c576e2b3fb7cfae7
BLAKE2b-256 9bda07ff8408a2087bd466da952dad4c19deeb5aa6a9e2f89e77d8b2e77a2877

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py310-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py310-none-any.whl
Algorithm Hash digest
SHA256 30767a01216832c113897d0362f873331aa225dca3f6533178a94a3fd4026402
MD5 8eafbc2bc8603477b65834d7d9be6652
BLAKE2b-256 5408e1d9c4d8f7a856f2e746f5df6024c7faa7c0623b4da8bff43cb3e72239db

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py39-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py39-none-any.whl
Algorithm Hash digest
SHA256 962ddec19ec6599407b468aaa9d26de2fb86b7c71849570e4f40305aba410935
MD5 a66dd50723e1539b1510644fa632021b
BLAKE2b-256 553d13b37d05de53b68dbdd9fb83611ddc35385a9f65b80801d0d518031bbe6f

See more details on using hashes here.

File details

Details for the file python_control_flow-1.0.0-py38-none-any.whl.

File metadata

File hashes

Hashes for python_control_flow-1.0.0-py38-none-any.whl
Algorithm Hash digest
SHA256 86a46ef9bee2627dc6956075f1abfb2f652be53a92c70f76aff1d8f72c168272
MD5 80684e2f5abee8597f39698dcc64c176
BLAKE2b-256 37b25aebee9d0470ba3141becdda3d7c80cbd9b322af1439742ca21a4587a997

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page