Setswana Scrabble

21 Aug 2013

I am foolish; not in the general Sonny Serite semantics of lacking good sense and judgement which has hitherto been ascribed to certain persons in the political domain. But I am foolish in another sense; in the sheer belief that much of what exists in the English language can be replicated in the Setswana language. This belief has led to work on the Setswana Google (www.google.co.bw), Setswana Rhyming Dictionary (published by CASAS) and the current work on the Setswana lexicography which has produced Tlhalosi ya Medi ya Setswana. The current development of Setswana debates, Setswana poetry competitions, Setswana quiz and Setswana essay competitions in schools comes from this excitement about possibilities in the Setswana language.

In the past few weeks I have been dreaming the development of Setswana Scrabble. The word game Scrabble was first developed by an American architect Alfred Mosher Butts in 1938. When it was first developed it was called Criss-Crosswords. It was named Scrabble by James Brunot in 1948 who bought the rights to manufacture the game in exchange for granting Butts royalty on every unit sold. The game is played on a square board with a 15 x 15 grid of cells. In an English-language set, the game contains 100 tiles, 98 of which are marked with a letter and a point value ranging from 1 to 10. The number of points for each lettered tile is based on the letter's frequency in standard English writing; commonly used letters such as E or O are worth one point, while less common letters score higher, with Q and Z each worth 10 points. The game also has two blank tiles that are unmarked and carry no point value. Other language sets use different letter set distributions with different point values. While Scrabble has been developed in a number of languages such as English, Afrikaans, Swedish, Turkish, Spanish, Russian, Slovak, Portuguese, Romanian and many others, it hasn’t been developed for the Setswana language; actually it hasn’t been developed for any Africa language.
Therefore in the past few weeks I have been foolish enough to dream and design Setswana Scrabble.

The results of my work were shared amongst my peers at the Department of English seminar series on August 16th. Since the game of Scrabble is based on letter frequencies, Setswana letter frequencies were computed. A database of about 20 million words was used. In total about 90 million Setswana characters/letters were analysed. Our study used a corpus querying software to compute the frequencies. The principal goal was to establish a letter frequency distribution for each Setswana orthographic letter and use such information to allocate a point to each letter based on frequency distribution. We were quick in our analysis to eliminate five non-essential characters to the Setswana language spelling. These are C, V, Z, X and Q. This elimination was in recognition of the fact that these characters are rarely used in the Setswana language. This is not to say that these characters do not exist completely. Some of these alphabets are used in the representation of certain sounds such as nx-nx-nx-nx or nc-nc-nc. There are also a number of borrowings such as the notorious vuvuzela or zama or even the super colloquial vaya (go). However in keeping with the design of scrabble in other languages these sounds were discarded as non-Setswana letters. The analysis revealed that A was the most common letter with a frequency of about 14 million. J was found to be the least frequent character occurring 400 thousand times in our database. All the letters were therefore arranged on the basis of frequency, starting with the most frequent letter and ending with the least frequent letter: AEOLT NGSIMK BRHWDY UFJ.

This sequencing is different from that of English and Afrikaans. The English letter frequency sequence is ETAON RISHD LFCMU GYPWD VKJXQZ. The most common letter in English is E while E is the second most common letter in the Setswana language. The study also studied letters which usually occur at the beginning of words and these, ordered by frequency are: BMTGKALSDNEROYFIPJWCHU. This means that most words or repeated words start with B and very few words begin with U. Scrabble tiles have eight points that may be given to any of the tiles. These points are: 0, 1, 2, 3, 4, 5, 8 and 10. The letters that are highly frequent are given the lowest points and they are given the highest number tiles. Tile allocation is based on frequency of occurrence. Letters that are highly frequent have the largest number of tiles but the smallest number of points. Letters that are very rare, such as Q & Z in English and J & F in Setswana, have the fewest number of tiles but the largest number of points. On the number of tiles, the following were decided: 16 tiles for A; 12 tiles for E, 11 tiles for O, 9 tiles for L, 6 tiles each for N, G, T, 5 tiles for S & I, 4 tiles for M & K, 3 tiles for B, R & H, 2 tiles for W & D and 1 tile for P, Y, U, F and J. Points were allocated in the following manner: 1 point for S, N, G, T, L, O, E, & A; 2 points were allocated to I; 3 points were given to M & K; 4 points were allocated to B; 5 points were allocated to W, D, R & H; 8 points went to four alphabets P, Y, U, F and 10 points was allocated to J only. Point allocation was based purely on frequency of occurrence.

The design of Setswana Scrabble is based on analysing large amounts of text. Since a game of Scrabble is fairly well known and common amongst Tswana speaking people who have been to school, it is hoped that the creating of Setswana Scrabble will extend the Setswana language use to areas which hitherto have been the preserve of the English language and other European languages.