Skip to content

FelipeCRamos/huffman-compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Coding Project

The implementation of the Huffman Coding Concept using C++14, aiming to achieve a stable compression/uncompression algorithm.

Compilation

In order to compile this project, you just need the bare gcc/clang packages installed in your computer.

You can compile by typing onto your favorite terminal:

# once already in project root folder
make

Execution

After compiled, you can execute the project by running a simple command, like:

./huf [input] [output]

Making sure that:

  • input file exists
  • You have permissions to write on output file

Details of implementation

This project uses concepts of tries to generate the shorter possible binary path to a given char, based on how many occurrences it has on the given input. With this being done, we can represent a normal 8 bits char with 4~5 bits (depending on the input).

Here's a briefly explanation about how the prefixed tree (trie):

Suppose we have a small text like: "bom esse bombom", we can extract the following info:
    char 'b' appears 3 times
    char 'o' appears 3 times
    char 'm' appears 3 times
    char 'e' appears 2 times
    char 's' appears 2 times
    char ' ' appears 2 times

And then, we can create a Prefixed Digital Tree like this:
                        ( *)
                       /    \
                      /      \
                   ( *)      ( *)
                  /   \        /\
                 /     \      /  \          ( sorry for the bad ascii art )
              ( *)     ( *) ('o') ('m')
              /  \     /  \  
             /   |     |   \
          ('b') ('s') ('e')(' ')

Where the path for i.e. char 'o' from the root can be represented by 2 bits,
one bit for first direction and another for second direction, like '10'
(
    1: dig right sub-tree
    0: dig left sub-tree
)

With that in mind, how the 'm' char would be coded? :P

So the 'key' to use this compression method is to have the tree and their paths.

That's why we will store on the compressed file our tree by pre-order traversal path, with the following syntax:

0001b1s01e1_01o1m

Where:
    Each '0' is an internal node of the tree
    Each '1' represent that the next byte will be read as a char

to be continued...

Other things

This was a project made by me for the Basic Data Structure II course on UFRN. This is far from being 100% polished, but i'll be very happy if you can contribute with anything to this project.

Authorship

Project made by Felipe Ramos with the MIT License.

Releases

No releases published

Packages

No packages published