The Morse code is a noiseless coding method. The following is the order-0 histogram of letters in book1 from the Calgary corpus (English text):
| Letter | Frequency | Letter | Frequency | Letter | Frequency |
|---|---|---|---|---|---|
| ' ' | 125551 | a | 48803 | b | 10595 |
| c | 13265 | d | 26892 | e | 72875 |
| f | 12650 | g | 12878 | h | 38538 |
| i | 39906 | j | 721 | k | 5039 |
| l | 23491 | m | 14609 | n | 41421 |
| o | 45651 | p | 10025 | q | 534 |
| r | 33134 | s | 37638 | t | 51993 |
| u | 16134 | v | 5446 | w | 14824 |
| x | 866 | y | 12402 | z | 264 |
The Shannon Entropy of this corpus is about 4.12 bits per byte, while the average code word length for the Huffman code (optimal discrete noiseless encoding) is about 4.15 bits per byte. We assume the following properties of the Morse code:
- A dash (
-) is three dits, a dot (.) is one dit. - Code symbols are separated by silence one dit long.
- Letters of the same word are separated by silence three dits long.
- Words are separated by silence for seven dits.
This implies that the Morse code requires, on average, 9.26 dits worth of time per letter of English text to transmit.
