image - Huffman code tables -
i didn't understand huffman tables of jpeg contain, explain me? thanks
huffman encoding variable-length data compression method. works assigning frequent values in input stream encodings smallest bit lengths.
for example, input seems every eel eeks elegantly.
may encode letter e
binary 1
, other letters various other longer codes, starting 0
. way, resultant bit stream smaller if every letter fixed size. way of example, let's examine quantities of each character , construct tree puts common ones @ top.
letter count ------ ----- e 10 <spc> 4 l 3 sy 2 smvrkgant. 1 <eof> 1
the end of file marker eof
there since have have multiple of 8 bits in file. it's stop padding @ end being treated real character.
__________#__________ ________________/______________ \ ________/________ ____\____ e __/__ __\__ __/__ \ / \ / \ / \ / \ / \ / \ / spc l s / \ / \ / \ / \ / \ y s m v / k g \ n t /\ / \ r . eof
now isn't most efficient tree it's enough establish how encodings done. let's first @ uncompressed data. assuming eight-bit encoding, thirty-one characters (we don't need eof
uncompressed data) going take 248 bits.
but, if use tree above locate characters, outputting 0 bit if take left sub-tree , 1 bit if take right, following:
section encoding ---------- -------- seems<spc> 00001 1 1 00010 0111 0101 (20 bits) every<spc> 1 00011 1 001000 00000 0101 (22 bits) eel<spc> 1 1 0110 0101 (10 bits) eeks<spc> 1 1 00101 0111 0101 (15 bits) elegantly 1 0110 1 00110 001110 01000 01001 0110 00000 (36 bits) .<eof> 001001 001111 (12 bits)
that gives grand total of 115 bits, rounded 120 since needs multiple of byte, that's still half size of uncompressed data.
now that's not worth small file this, since have add space taken actual tree itself(a), otherwise cannot decode @ other end. certainly, larger files distribution of characters isn't even, can lead impressive savings in space.
so, after that, huffman tables in jpeg tables allow uncompress stream usable information.
the encoding process jpeg consists of few different steps (color conversion, chroma resolution reduction, block-based discrete cosine transforms, , on) final step lossless huffman encoding on each block tables used reverse when reading image.
(a) best case minimal storage of table like:
size of length section (8-bits) = 3 (longest bit length of 6 takes 3 bits) repeated each byte: actual length (3 bits, holding value between 1..6 inclusive) encoding (n bits, n actual length) byte (8 bits) end of table marker (3 bits) = 0 distinguish actual length above
for text above, be:
00000011 8 bits n bits byte --- ------ ----- 001 1 'e' 12 bits 100 0101 <spc> 15 bits 101 00001 's' 16 bits 101 00010 'm' 16 bits 100 0111 's' 15 bits 101 00011 'v' 16 bits 110 001000 'r' 17 bits 101 00000 'y' 16 bits 101 00101 'k' 16 bits 100 0110 'l' 15 bits 101 00110 'g' 16 bits 110 001110 'a' 17 bits 101 01000 'n' 16 bits 101 01001 't' 16 bits 110 001001 '.' 17 bits 110 001111 <eof> 17 bits 000 3 bits
that makes table 264 bits totally wipes out savings compression. however, stated, impact of table becomes far less input file becomes larger , there's way avoid table altogether.
that way involves use of variant of huffman, called adaptive huffman. table isn't stored in compressed data.
instead, during compression, table starts eof
, special bit sequence meant introduce new real byte table.
when introducing new byte table, output introducer bit sequence followed full 8 bits of byte.
then, after each byte output , counts updated, table/tree rebalanced based on new counts space-efficient (though rebalancing may deferred improve speed, have ensure same deferral happens during decompression, example being every time add byte first 1k of input, every 10k of input after that, assuming you've added new bytes since last rebalance).
this means table can built in exactly same way @ other end (decompression), starting same minimal table eof
, introducer sequence.
during decompression, when see introducer sequence, can add byte following (the next 8 bits) table count of zero, output byte, adjust count , re-balance (or defer mentioned).
that way, not have have table shipped compressed file. this, of course, costs little more time during compression , decompression in you're periodically rebalancing table but, things in life, it's trade-off.
Comments
Post a Comment