By Martin Benjamin | Executive Director
Dear Kamusi Supporters,
Time for a quarterly update, after a busy three months. The most intriguing development is that we have our KamusiGames system running in beta for 12 languages. The Games are intended to bring in consistent data across languages, with a base parallel vocabulary of some 200,000 concepts. We're adding new languages as quickly as volunteers complete the translations of the interface. We expect to go public with the initial games sometime in the northern autumn. Meanwhile, please let me know your Facebook ID if you would like to be included as a tester.
In April, we started importing over a million new terms in 20 languages, but we froze the process because it was too server-intensive (about 10 seconds per record while also handling normal lookups and fending off DDoS attacks). We're therefore going to turn off public Kamusi services from about July 20 to August 11, traditionally our lowest demand period, so we can use all our server power to finish the import.
If you take a quick peek at the Thai entry for "salt" before we power down, you'll see the cool way that languages link together. But you'll also see that the existing data we can use as a starting point is really inadequate - all that we have is basic word forms, with none of the extended information (like definitions) that make for a truly informative knowledge resource. KamusiGames will change that, for many many languages - over time, expect to see much richer data for each entry, and many more languages linked together in the knowledge matrix.
We have a lot more programming improvements in the pipeline. Our South African lead programmer, Greg McKeen, was able to spend a week at HQ in Switzerland in June, and we made fantastic progress on the coding. However, the task list toward dictionary nirvana only grew, while available funds did not, so implementation is necessarily slow.
We've also been moving ahead on the Human Languages Project consortium, with about 75 collaborating organizations currently on board. Once we find funding, these groups will work together to create a global language data infrastructure. Activity will involve both the production and deployment of language data, especially for languages that have been previously excluded from world knowledge and technology systems. To see why we're so passionate about making this a reality, please take a few minutes to watch this award-winning video (https://youtu.be/5M_bPt85MNo) (not by Kamusi) where students from Zanzibar show what happens when adequate language resources are not available.
Other activities have included meetings to figure out the place of Kamusi within the White House Big Data Initiative, the UNESCO Information For All Programme, and the activities of the official language body of the African Union. Mohomodou Houssouba (a scholar and Kamusi board member) and I travelled to Mali to launch three languages within the Mali Numérique 2020 agenda, with more intensive work forthcoming. And there have been some interesting developments regarding Asian languages, and an initiative to produce emergency vocabularies for humanitarian organizations, which I'll plan to expand upon in the next newsletter.
Wishing you a happy mid-2015, and thanks for your continued support,
Martin
Links:
Project reports on GlobalGiving are posted directly to globalgiving.org by Project Leaders as they are completed, generally every 3-4 months. To protect the integrity of these documents, GlobalGiving does not alter them; therefore you may find some language or formatting issues.
If you donate to this project or have donated to this project, you can receive an email when this project posts a report. You can also subscribe for reports without donating.