Tuesday, October 3, 2023

How much Setswana is on the internet?

The question posed above is a difficult one to answer. Well, in part because we face challenges of determining what Setswana words are. For instance is the word gore Setswana, English, Sesotho sa Leboa (Sepedi) or Sesotho (Southern Sotho)? What about go, bale, lone, gone, and bone? If we wanted to count Setswana words we could easily download all the Setswana pages on the internet and then could all the words in those pages. Such a process is fairly straightforward and has been attempted by myself with the aid of the St Louis University Mathematician called Kevin Scannell. The results of that investigation were over half a million words mined from the web which were subsequently used in the compilation of a large Setswana corpus in 2005. Our interest in knowing how much Setswana exists on the internet is based on an attempt of determining the amount of Setswana usage on a platform which represents much technological progress. The question therefore seeks to find out if the Setswana language is reaching domains which hitherto were predominantly English domains. Internet World Stats has shown that the top 10 internet languages are English, Chinese, Spanish, Japanese, Portuguese, German, Arabic, French, Russian and Korean. Of the top 10 world internet languages, there is no Africa language. In 2010 over 540 million internet users used English while 445 million users on the internet used Chinese. If you combined the presence of all the languages on the internet outside the top 10 internet languages, you get 350 million users. In March 2011 internet users in Africa amounted for 5.7% of the entire world’s presence on the internet. Around the same time, the top ten countries which used the internet were Nigeria (44million), Egypt (20.1million), Morocco (13.2), South Africa (6.8million), Algeria (4.7.million), Sudan (4.2million), Kenya (4million), Tunisia (3.6million), Uganda (3.2million) and Zimbabwe (1.4million). While there is some activity on the web from many African countries, much of it is in colonial languages, in particular English and French. African languages are largely ill-represented on the internet. This isn’t good for the position and future of African languages. For African languages to survive and flourish, they have to be used in a number of domains, one of which is the internet.

We must accept that not all is gloom and doom concerning African languages on the internet. A couple of years ago, we gathered in Niamey, Niger to consider this very same matter at a conference which brought together research from various parts of Africa. Considerable work has gone into translating Google search into Setswana. The search engine in Setswana is found here: www.google.co.bw and this ground breaking work was down mainly at the University of Botswana. The good thing is that the same Setswana Google search is also available on the South African Google page www.gooogle.co.za. This is important since Setswana is used in both Botswana and South Africa and there would be no need for the South Africans to produce their own Google Search. The search engine is critical in according the Setswana native speakers an opportunity to search for information through their own language. This in many ways demonstrates the growth of the Setswana language into new domains. The Setswana Wikipedia page also exists here http://tn.wikipedia.org and it is frequently updated by a number of volunteers from across the globe. It is important that the world knowledge exists in the Setswana language since if it existed in other languages such as English only, it would send a false message that knowledge was synonymous with foreign languages and not Setswana. Microsoft has also released the Microsoft Office Language Interface Pack 2010 in Setswana. This means that Batswana can use Microsoft Office with a Setswana interface. There are also online Setswana Radio Stations such as Motsweding FM, which broadcast from Mafikeng South Africa exclusively in Setswana and Radio Botswana which broadcasts from Botswana. Gabs FM which broadcasts in both Setswana and English may also be accessed online. The Botswana Daily News Setswana text is also available online at www.dailynews.gov.bw. In 2004 together with Kevin Scannell we produced Setswana spellchecker for OpenOffice. This resource is freely available online and may be downloaded from here: http://pkgs.org/opensuse-11.4/opensuse-oss-x86_64/aspell-tn-1.0.1-24.1.x86_64.rpm.html. Certain websites such as that of the North West University of South Africa, is available in Setswana (see: www.nwu.ac.za). There are also multiple Facebook pages on Setswana such as Lekgotla la Setswana, Puo Letlotlo, and Setswana se se kwenneng which exist to promote the Setswana language. There are also Setswana blogs such as Puo ya rona anf Setswana pages at www.setswana.info which introduces one to the Setswana language. There is also evidence that there is Setswana use on twitter. Evidence for this may be on http://indigenoustweets.com which records the amount of tweets in indigenous languages. Going to http://indigenoustweets/tn/ will show you an interface in the Setswana language. Indigenous tweets is the result of clever computing by Kevin Scannell. www.translate.org.za is another non-profit making organization that is focused on the localization, or translation, of Open Source software into South African’s 11 official languages, including Setswana. They have created fonts for South African languages and a South African keyboard.

It is clear that there is a fair amount of Setswana material online; most of which is uncoordinated. There is therefore a need to create cross-border groups which could produce and publish material online for the benefit of Setswana speaking communities wherever they are. There is an encouraging use of Setswana on social media such as Facebook and Twitter which must be supported and promoted. Currently only one online Setswana newspaper has some Setswana material. There is a need to increase this and create a monolingual Setswana newspaper online which will publish news and cultural material of the Batswana. There is a need to coordinate corpora compilation for the Setswana language and make it freely available to researchers. On a technical level, there is a need to develop taggers and parsers for Setswana corpora so that they could be exploited more efficiently. There is a new Setswana online hub which is being developed which will house Setswana texts and information which will benefit Setswana education and general knowledge. So how much Setswana is online? We are not sure, but it is growing steadily.


Read this week's paper