MSP430 Background: Encoding an ASCII to Morse Conversion

This page does not constitute a complete example, simply some background information on an ASCII to Morse Code translation method which will be used in some future examples centered around low-power timing and interrupt control. This is by far not the most clever encoding one could choose for this task (in particular, it wastes nearly 64 out of 128 bytes of the encoding table on empty space, which is a real concern with a device which may have only a couple of kilobytes of FLASH space!), but it is extremely simple and, I think, easy to understand. Lookup is very cheap, and conversion to timings is simple. As Dr. Knuth (Art of Computer Programming) says, “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”. If you’re experiencing memory pressure, a significant portion of this table (52 bytes) may be cheaply eliminated with just two range checks in the lookup.

The Encoding

Figure 1. Encoding example for the character 'C'.

The basic premise of this encoding is that a single Morse Code character (with the possible exception of some prosigns) can be trivially encoded into 8 bits, representing a dot as a 0 and a dash as a 1 and indicating the length of the character with a single marker bit.

The format of each Morse character is as follows: Zero or more 0 bits, followed by a single 1 bit, followed by the Morse Code representation of the character in binary, with a 0 indicating a dit and a 1 indicating a dah. This particular length encoding (zero padding, followed by a single 1) allows us to encode sequences of up to seven dits and dahs in an 8-bit byte. Because “normal’ Morse Code sequences can occupy between one (E and T) and six (most punctuation) symbol spaces, numeric length encodings are not practical. The trade-off here is that parsing a character of one symbol length requires interpreting all eight bits of the encoded character in a loop, the same as a character of six symbol length.

In this encoding, the character E is encoded as 0b0000 0010, or 0x02. The six leading 0 bits are nothing but padding, and the 1 bit serves to indicate that the remaining bits represent dits and dahs. The Morse Code representation of E is a single dit, so the final bit is a 0. The character C (Fig. 1)is dah-dit-dah-dit in Morse, so its representation in this encoding is 0b0001 1010, or 0x1A. Again, the first three leading 0s are padding, and the 1 in the next bit indicates the beginning of the bits of interest. 1010 maps to dah-dit-dah-dit, which is precisely C.

The Lookup

Once we have encoded the entire alphabet and some convenient punctuation, we simply place it in an ASCII-ordered table for lookup. Since each ASCII character has a well-defined numeric representation (for example, the character 'A' is 0x41, or 65, and the lowercase 'a' is 0x61, or 97), we place the Morse encoding of each letter, number, and punctuation mark in the numeric slot of its ASCII character. Morse Code has no concept of case, so the Morse character for each letter is duplicated in both its uppercase and lowercase ASCII position.

The completed table is:

const unsigned char morse_ascii[] = {
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    0x73, MORSE_NONE, 0x55, 0x32,                   /* , _ . / */
    0x3F, 0x2F, 0x27, 0x23,                         /* 0 1 2 3 */
    0x21, 0x20, 0x30, 0x38,                         /* 4 5 6 7 */
    0x3C, 0x3E, MORSE_NONE, MORSE_NONE,             /* 8 9 _ _ */
    MORSE_NONE, 0x31, MORSE_NONE, 0x4C,             /* _ = _ ? */
    MORSE_NONE, 0x05, 0x18, 0x1A,                   /* _ A B C */
    0x0C, 0x02, 0x12, 0x0E,                         /* D E F G */
    0x10, 0x04, 0x17, 0x0D,                         /* H I J K */
    0x14, 0x07, 0x06, 0x0F,                         /* L M N O */
    0x16, 0x1D, 0x0A, 0x08,                         /* P Q R S */
    0x03, 0x09, 0x11, 0x0B,                         /* T U V W */
    0x19, 0x1B, 0x1C, MORSE_NONE,                   /* X Y Z _ */
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
    MORSE_NONE, 0x05, 0x18, 0x1A,                   /* _ A B C */
    0x0C, 0x02, 0x12, 0x0E,                         /* D E F G */
    0x10, 0x04, 0x17, 0x0D,                         /* H I J K */
    0x14, 0x07, 0x06, 0x0F,                         /* L M N O */
    0x16, 0x1D, 0x0A, 0x08,                         /* P Q R S */
    0x03, 0x09, 0x11, 0x0B,                         /* T U V W */
    0x19, 0x1B, 0x1C, MORSE_NONE,                   /* X Y Z _ */
    MORSE_NONE, MORSE_NONE, MORSE_NONE, MORSE_NONE,
};

Note: Previous versions of this table had an error in the ASCII character 9, resulting in dash-dot-dash-dash-dash being sent instead of dash-dash-dash-dash-dot. Thanks to Daniel Baehr for pointing this out and providing the correction.

Compiler Concerns

Note that the type of the character array is declared const. This is vitally important, because it indicates to the compiler that this array may be placed in FLASH memory and left there for the duration of execution. Because the MSP430 part you are using may have only 128B of RAM to begin with, keeping this 128B array out of RAM is critical! Furthermore, declaring it of type char * rather than char [] seems to be necessary in some versions of gcc to prevent the array from being placed in the BSS. Sometimes, for things like this, you just have to play with it a little.