Software method generates 5 bit Baudot codes

This idea describes a technique for generating 5 bit Baudot codes (also known as the CCITT International Telegraph Alphabet No 2) using a small computer's own serial (asynchronous) interface; no additional hardware is required. The BBC BASIC program listed here (Fig. 1) can be easily adapted for use with other machines.

In asynchronous transmission data are sent one character at a time with variable gaps between characters (Figs. 2a and 2b) during a gap (the 'idle' time) the serial output line remains in the 'mark' state; ie, set to a '1'; to signal the start of the transmission of a character a 'start' bit (a '0') is sent in the bit interval before the data bits. In the bit intervals immediately following the data bits a '1' is sent lasting for either 1, 1.5 or 2 bit periods to indicate the end of that character's transmission. The number of stop bits is an indication of the minimum interval between the last data bit and the start bit of the next character. Data bits are transmitted starting with the LSB; (after the MSB and before the first stop bit a parity bit may be sent but in this application the parity generation must be disabled and no parity sent).

The Baudot code is used in the transmission of Telex and Radio Teletype; the code uses only five data bits and has a character set limited to upper case letters, numerals, some punctuation and special codes. The total number of characters is extended from the 32 possible combinations by using two "shift" codes (31 = 'shift to letters', 27 = 'shift to figures'); these are non-printing and extend the character set to 50 (plus two control codes). Thus each 5 bit code can represent two possible characters; which character a code corresponds to is determined by which shift character was most recently transmitted. Line feed, carriage return, space and the shift codes themselves are exceptions and are common to both the figure and letter character sets.

The serial interface of computers can be programmed to produce either 7 or 8 bit codes but rarely is it possible to select only 5 bits. Comparison of 8 bit and 5 bit formats in Figs. 2a and 2b shows that by forcing the three MSBs of the 8 bit code to the idle (1) state the word can be made to resemble a 5 bit one.

The program generates an 8 bit code (array variable S%( . )) from the 5 bit code (P%( . )) (program lines: 600-760) according to the equation:

S%( . ) = P% ( . ) + 224 ( where 0  is less than or equal to P  is less than or equal to 31 )

ie, the 8 bit serial output has the last three (most significant) bits set to '1'.

When the program is started the code-conversion table is displayed on the screen: the first column is the ASCII code, the second the 5 bit Baudot code, the third the character itself and the fourth the 8 bit Baudot code. Zero entries in column two indicate that there is no corresponding Baudot code for that character and that it will be coded as 224 (which is an undefined Baudot code).

Once a key is pressed the corresponding 8 bit code is found (lines 500-590) and is sent to the serial interface. If necessary an appropriate shift character is sent first. Special characters such as CR and BEL are dealt with separately by lines 220-350. Since the Baudot alphabet is limited to upper case letters any lower case codes are first converted to upper case ones by lines 770-810.

L Mayes, Rochdale, Lancs

Figure 1: Program Listing

100 REM PROGRAM TO PRODUCE 5 BIT BAUDOT CODE ON SERIAL O/P FROM BBC K/B
101 REM ===============================================================
102 REM
103 REM LAWRENCE MAYES
104 REM
110 DIM P%(&40), S%(&40)
120 *TV0,1
130 *FX4,1
140 REM ENABLE CURSOR KEYS
150 MODE 3
160 *FX3,0
170 VDU3: REM TURN OFF SERIAL PORT
180 GOSUB 600: REM SET UP TABLE
190 GOSUB 420: REM SET UP SERIAL INTERFACE
200 A0% = GET: REM GET ASCII CODE FROM K/B
210 GOSUB 770: REM CHANGE LC TO UC
220 IF A0%=&0D VDU2: VDU 1,232: VDU 1,226: VDU3: VDU &D: VDU &A: GOTO 200
230 REM TRANSMIT [CR] + [LF]
240 IF A0%=&88 VDU2: VDU1,232: VDU3: VDU &D: GOTO 200
250 REM TRANSMIT [CR] ONLY
260 IF A0%=&20 VDU2: VDU 1,228: VDU3: VDU &20: GOTO 200
270 REM TRANSMIT [SP]
280 IF (A0%=&0A OR A0%=&8A) VDU2: VDU 1,226: VDU3:VDU &0A:GOT0 200
290 REM TRANSMIT [LF] ONLY
300 IF A0%=7 AND S$="FIG" VDU2: VDU1,235: VDU3: VDU7: GOTO 200
310 IF A0%=7 AND S$="LET" VDU2: VDU1,251: VDU1,235: VDU3: S$="FIG": VDU7: GOTO 200
320 REM 7 IS THE BEL CODE
330 IF A0%=5 AND S$="FIG" VDU1,233: GOTO 200
340 IF A0%=5 AND S$="LET" VDU2: VDU1,251: VDU1,233: VDU3: S$="FIG":GOTO 200
350 REM 5 IS THE ENQ CODE: GENERATES THE WRU CODE
360 IF A0% < &20 GOTO 200
370 IF A0% > &60 GOTO 200
380 VDU A0%: REM ECHOES CHARACTER ON SCREEN
390 A0% = A0% - &20
400 IF A0% > &20 AND A0% < &3B GOSUB 500 ELSE GOSUB 550
410 GOTO 200
419 REM **********************************
420 REM * S/R TO SET UP SERIAL INTERFACE *
421 REM **********************************
430 *FX5,2
440 REM ENABLE SERIAL INTERFACE
450 *FX8,1
460 REM SET TO 75 BAUD
470 VDU2: VDU1,255: VDU3: REM XMIT LETTER SHIFT
480 S$ = "LET": REM S$ STORES CURRENT SHIFT STATE
490 RETURN
499 REM *******************************
500 REM * S/R THAT DEALS WITH LETTERS *
501 REM *******************************
510 VDU2
520 IF S$="FIG" VDU 1,255: S$ = "LET"
530 VDU 1,S%(A0%): VDU3
540 RETURN
549 REM *******************************
550 REM * S/R THAT DEALS WITH FIGURES *
551 REM *******************************
560 VDU2
570 IF S$="LET" VDU 1,251: S$ = "FIG"
580 VDU 1,S%(A0%): VDU3
590 RETURN
599 REM ************************
600 REM * S/R TO SET UP TABLES *
601 REM ************************
610 FOR I% = 0 TO &40
620 PRINT I%+32;: READ P%(I%): PRINT P%(I%);SPC(5);: S%(I%)=P%(I%)+224
630 READ Z$: PRINT Z$; TAB(45); S%(I%): NEXT 1%
640 DATA 4,"[SP]", 9,"!", 0,"[QUOTE MARK]", 0,"[HASH]", 0,"$", 13,"%"
650 DATA 0,"&", 5,"'", 15,"(", 18,")", 0,"*", 17,"+", 12,","
660 DATA 30,"-", 28,".", 29,"/"
670 DATA 22,"0", 23,"1", 19,"2", 1,"3", 10,"4", 16,"5"
680 DATA 21,"6", 7,"7", 6,"8",,24,"9"
690 DATA 14,":", 0,";", 0,"<", 0,"=", 0,">", 25,"?", 26,"@"
700 DATA 3,"A", 25,"B", 14,"C", 9,"D", 1,"E", 13,"F", 26,"G", 20,"H", 6,"I"
710 DATA 11,"J", 15,"K", 18,"L", 28,"M", 12,"N"
720 DATA 24,"0", 22,"P", 23,"Q", 10,"R"
730 DATA 5,"S", 16,"T", 7,"U", 30,"V", 19,"W", 29,"X", 21,"Y", 17,"Z"
740 DATA 0,"[", 0,"[BACKSLASH]", 0,"]", 0,"[UPARROW]", 3,"_" , 20,"[POUND]"
750 REM THERE ARE TWO WAYS TO GENERATE THE WRU CODE: 1/ ENQ 2/ "!"
760 RETURN
769 REM **************************
770 REM * S/R TO TURN LC INTO UC *
771 REM **************************
780 IF A0% > &60 AND A0% < &7B THEN 790 ELSE RETURN
790 REM SELECTS LOWER CASE LETTER CODES
800 A0% = A0% - &20: REM CONVERSION TO UC
810 RETURN
820 END

Figures 2a & 2b: 8 bit and 5 bit serial formats

8 & 5 bit serial formats


Notes

  1. Although there is some debate as to the correct designation of this 5 bit code it is commonly known as Baudot although some prefer Murray; however, the correct designation is usually understood to be: International Telegraph Alphabet No 2 (ITA2). (The ASCII code is International Alphabet No 5 (IA5), according to 'A Basic Guide to Data Communications', published by BACT/OFTEL 1993.)
  2. The look-up table was derived from the code table in my copy of 'The Hacker's Handbook III' by Hugo Cornwall on page 204 (electronic versions of the book are available on the net at: Hacker's Handbook Version 1Hacker's Handbook Version 2 and Hacker's Handbook Version 3 - unfortunately it appears that the ITA2 code has been left out of these but see the next link at 3 below). There is a serious error in my paper edition of this book: on page 13 the author implies that the bits are sent: MSB first followed by the rest in decreasing order of significance; this is wrong and my article and software are correct (as are the notes appended to the code listing in the ARRL handbook found by following the link in note 3 below).
  3. The ITA2 code table listing from the ARRL handbook (see section 30.2 of the linked file) can be obtained - note: this is a pdf file.
    Alternatively another table listing is available (as part of a communications/computing dictionary). Yet another table listing: this one includes an interesting varaint of the code which was used in the Illiac computer.
  4. None of the data rates used in RTTY or Telex corresponds exactly to those commonly used by computer serial links (furthermore, the speeds are usually quoted in words per minute (a word is equal to 5 printing characters + 1 space character) and three different nominal stop bit lengths are used - 1, 1.42 and 1.5 bit periods which adds to the confusion). See my table for more details. A speed of 75 baud was chosen for the serial port here because (of all the choices) this comes closest to a speed used in RTTY, viz. 74.2 baud which corresponds to 100 wpm with a stop bit = 1.42 bit periods or 98.98 wpm with a stop bit = 1.5 bit periods.

  5. There are several minor variants of the code where some pairs of characters may be swapped or substitutions made with more useful characters (the listing linked at 3 above includes the US TTY code variant). However, the most significant differences occur with the use of the SHIFT characters. Normally, once a SHIFT has been received all following characters are interpreted as either LETTERS or FIGURES depending on the most recent SHIFT character received. An alternative mode is 'UNSHIFT ON SPACE': at the receiver the SHIFT state reverts to LETTERS whenever a SPACE character is received. I presume that this is to reduce the effect of errors (say an error causes a SHIFT to be received when none was sent: this would cause all subsequent characters to be garbled until a correct SHIFT character is received). This is clearly a problem when, for example, only letters are sent and an erroneous FIGURE SHIFT is received in the middle. In the case of 'UNSHIFT ON SPACE' such a garble would be rectified at the end of the word in which the error occurred. A similar argument can be applied to strings of groups of figures, since a FIGURE SHIFT would need to be sent after every space. This strategy will, of course, only work where both the transmitter and receiver are set to this mode.

    It is a reasonably simple exercise to incorporate 'UNSHIFT ON SPACE' into the software so that a FIGURE SHIFT is automatically sent after every space when the most recent SHIFT from the keyboard was a FIGURE one.

  6. This article was published in ELECTRONIC PRODUCT DESIGN, 14, ISSUE 9; SEPTEMBER 1993 p 22


Electronics Index | Feedback

Last updated: 15 November 2001;    © Lawrence Mayes, 1993 & 2001