Friday 15 March 2013

algorithm - Average number of bits required to store one letter of British English using perfect compression in python -


I have an assignment under which:

What is the average number of bits If perfect compression is used then is it necessary to store a letter of British English?

Since the entropy of an experiment can be interpreted as the minimum number of bits required to store its results.

It gives me 4.17 bit but according to

I tried to create a program to calculate the entropy of all the letters and then all of them together added up. Compression algorithm should only require 2 bits per character!

How do I implement the correct compression algorithm on this? Import letters math = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', ' (C): PRC = [0.082,0.015,0.028,0.043,0.127,0.022,0.02,0.061,0.07,0.002, on 'V', 'W', 'X', 'Y', 'Z'] = 0 df Search_ 0.008,0.04,0.024,0.067,0.075,0.019,0.001,0.060,0.063,0.091, 0.028,0.01,0.023,0.001,0.02,0.001] Characters = ['A', 'B', 'C', 'D' , 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', ' Q ',' R ',' S ',' T ',' U ',' V ',' W ',' X ',' Y ',' Z '] Pause = 0 Floating = S.Paper () Bone in the letter Yi: xrange (1, x in the lane): If the temporary == letter [x]: pos = x return perc [pos] def calc_ent (s): p = find_perc (s) yoga = 0 # of the current letter Entropy calculates temp = P * (math.log (1 / p) /math.log (2)) The only thing for the # binary entropy (I think) #temp = (-P * (math.log) (P) /math.log (2)) - for X in Xrange (0,25) (1-P) * (math.log (1-P) /math.log (2))) sum = temp Return amount: sum = sum + calc_ent (letter [x]) print is "minute bit:% f"% sum

If you read carefully the page linked to you, So the introder calculated is not counted indefinitely using the probability of events for each letter - instead, it

this topic was shown to the previous 100 characters of the text And to succeed, he asked to guess the next character.

So I think that you are not wrong, only the method you use is different - only the use of comfortable experience probability data Doing well, you If you keep the context into consideration, then unnecessary information is too much, for example, there is a probability of 0.127 in e , but th _ Perhaps something more like 0.3 for , e .

No comments:

Post a Comment