Update 2021: installable Python package
Python implementation of some common Connectionist Temporal Classification (CTC) decoding algorithms. A minimalistic language model is provided.
- Go to the root level of the repository
- Execute
pip install . - Go to
tests/and executepytestto check if installation worked
Here is a minimalistic executable example:
import numpy as np
from ctc_decoder import best_path, beam_search
mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]])
labels = 'ab'
print(f'Best path: "{best_path(mat, labels)}"')
print(f'Beam search: "{beam_search(mat, labels)}"')The output mat (numpy array, softmax already applied) of the CTC-trained neural network is expected to have shape TxC
and is passed as the first argument to the decoders.
T is the number of time-steps, and C the number of characters (the CTC-blank is the last element).
The labels predicted by the neural network are passed as the labels string to the decoder.
Decoders return the decoded string.
This should output:
Best path: ""
Beam search: "a"
To see more examples on how to use the decoders,
please have a look at the scripts in the tests/ folder.
Beam search can use a character-level language model. Text statistics (bigrams) are used by beam search to improve reading accuracy.
from ctc_decoder import beam_search, LanguageModel
# create language model instance from a (large) text
lm = LanguageModel('this is some text', labels)
# and use it in the beam search decoder
res = beam_search(mat, labels, lm=lm)The lexicon search decoder computes a first approximation with best path decoding. Then, it uses a BK-tree to retrieve similar words, scores them and finally returns the best scoring word. The BK-tree is created by providing a list of dictionary words. A tolerance parameter defines the maximum edit distance from the query word to the returned dictionary words.
from ctc_decoder import lexicon_search, BKTree
# create BK-tree from a list of words
bk_tree = BKTree(['words', 'from', 'a', 'dictionary'])
# and use the tree in the lexicon search
res = lexicon_search(mat, labels, bk_tree, tolerance=2)Some notes:
- No adapter for TensorFlow or PyTorch is provided
- Apply softmax already in the model
- Convert to numpy array
- Usually, the output of an RNN layer
rnn_outputhas shape TxBxC, with B the batch dimension- Decoders work on single batch elements of shape TxC
- Therefore, iterate over all batch elements and apply the decoder to each of them separately
- Example: extract matrix of batch element 0
mat = rnn_output[:, 0, :]
- The CTC-blank is expected to be the last element along the character dimension
- TensorFlow has the CTC-blank as last element, so nothing to do here
- PyTorch, however, has the CTC-blank as first element by default, so you have to move it to the end, or change the default setting
Recommended decoders:
best_path: best path (or greedy) decoder, the fastest of all algorithms, however, other decoders often perform betterbeam_search: beam search decoder, optionally integrates a character-level language model, can be tuned via the beam width parameterlexicon_search: lexicon search decoder, returns the best scoring word from a dictionary
Other decoders, from my experience not really suited for practical purposes, but might be used for experiments or research:
prefix_search: prefix search decodertoken_passing: token passing algorithm- Best path decoder implementation using OpenCL (see
extras/folder)
This paper gives suggestions when to use best path decoding, beam search decoding and token passing.
- Documentation of test cases
- Documentation of the data
- Graves - Supervised sequence labelling with recurrent neural networks
- Hwang - Character-level incremental speech recognition with recurrent neural networks
- Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
- Marti - The IAM-database: an English sentence database for offline handwriting recognition
- Beam Search Decoding in CTC-trained Neural Networks
- An Intuitive Explanation of Connectionist Temporal Classification
- Scheidl - Comparison of Connectionist Temporal Classification Decoding Algorithms
- Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm