By Martin Benjamin | Executive Director
2017 is shaping up as the watershed year in which many of the claims that Kamusi has been making about our potential to document "every word in every language" become demonstrable. While the goal will always remain unreachable, our recent activities show energetic progress in moving toward it. We have released tools for the public that already provide much better vocabulary translation than Google Translate among included languages. Our open search is on the path for around 60 languages yielding 3500 highly accurate bilingual dictionaries by the end of the year. At lesser accuracy, we are importing data from about 7000 languages, integrated with tools for users to participate in precision alignment; some of the data sets have hundreds of thousands of terms, while a few thousand languages have only a smattering. By year's end we will have as many as 30 million terms, and several free new tools for sharing that knowledge.
In terms of data, our latest news is the inclusion of 5 languages from South Africa within the system. 18 languages from India will be online very soon.
Our progress is due to a new approach toward moving the project. After years of fruitless efforts to find funds to pay for basic development, we decided to finish proving the concept first, and find funding later. (In industry, this is called reaching "minimal viable product", while in non-profit management it is called "insanity".) Kamusi Labs is now an international "virtual" laboratory for computational linguistics. Graduate and undergraduate students from near and far join the project for summer or term-time internships. The students gain experience, satisfy credit obligations from their home universities, and see immediate results from their work. This summer we have 20 students from nine countries, with slots filling rapidly for the autumn.
Several of the students are working on core development (language data, database design, input and output systems), but many are pushing forward on new elements that we could not undertake if we first sank months into seeking grants to pay for them.
Most of our recent focus has been more technical than linguistic. As the bolts are tightened on our tools for collecting, managing, and sharing data, we anticipate concerted activities to enhance the resources for individual languages. Several of these are in the pipeline for autumn, with computer science students tasked to work with data sets and language communities for their mother tongues. Opportunities are legion to use our platform for the development of any particular language. We especially invite linguistically-oriented faculty to have their students talk with Kamusi Labs about interesting projects for their thesis research.
None of this activity can continue indefinitely without funding, of course. One intern is working on a platform for supporters to sponsor individual words, and we are exploring other ways to generate revenue within our non-profit mission. We are also continuing to pursue grants. Hopefully, funding will be easier to come y now that we have exciting results to show, but so far we have not located any government agency or philanthropy that finds language infrastructure worthwhile. If you have any ideas or contacts, please let us know.
Meanwhile, we will keep pressing on. We invite you to use our free services:
Multilingual Dictionaries:
Swahili Dictionary:
72-language Emoji Dictionary:
Project reports on GlobalGiving are posted directly to globalgiving.org by Project Leaders as they are completed, generally every 3-4 months. To protect the integrity of these documents, GlobalGiving does not alter them; therefore you may find some language or formatting issues.
If you donate to this project or have donated to this project, you can receive an email when this project posts a report. You can also subscribe for reports without donating.
