By Martin Benjamin | Executive Director
A few updates to share with you on what's been happening at Kamusi since the New Year. It has been an exciting few months in the evolution of both the current incarnation of the project, and its upcoming evolution.
We are currently in the process of importing more than a million new records that link more than twenty languages. Because the import is slow and resource-intensive, you might find the kamusi.org website switched off when we are running the script, but you'll soon have a first glimpse at the intricate language matrix we are developing. You'll see that we have about 200,000 tentative English definitions from an open source called "WordNet", which are mapped to hundreds of thousands of terms in other languages. Upcoming Kamusi tasks will involve games for English word mavens to produce much better English definitions, and work with other language communities to add definitions and a lot more detail for their languages. Even in its initial state, inclusion in Kamusi should serve as a substantial improvement on how people can use the Multilingual WordNet data for the languages involved, and a quantum leap in demonstrating where we are headed.
While I could labor on about our data model and technical features, the key to our success will be involving language communities in producing their own resources using the project systems. To that end, we had a training session in Hanoi in February for our Vietnamese team, with the first 10,000 terms scheduled to come online between now and the end of 2015. I was then able to present Kamusi at the International Conference for Language Documentation and Conservation and a couple of universities in the western US, as a result of which several Native American groups are now poised to join the project. At some point soon, we will also begin intensive work with three languages of Mali as part of that country's official Mali Numérique 2020 program. In combination with the technical platform, this activist approach to working with partners and languages far and wide will lead soonishly to a lot of new resources for quite a number of languages in the pipeline.
On the evolutionary front, we are now in the process of preparing a major new incarnation: the Human Languages Project. Similar to the way the Human Brain Project is uniting every aspect of neuroscience across numerous institutions, HLP will unite groups working on numerous aspects of language documentation, use, education, and technology. Led by Kamusi, we will be working toward a global language data infrastructure - a common resource to which communties can contribute their linguistic knowledge in a structured format that can be put to service for greater societal and technological purposes. Nearly 50 organizations from around the world have already signed letters of intent to join the consortium, including those who will be providing data for their languages, and those who will be developing technologies to put that data to powerful use.
More on the HLP in the summer update. Meanwhile, time to finish importing the WordNet data, start bringing in some data sets from other languages that we have waiting in the wings, keep pressing toward the launch of our games platform to produce and refine language data with the crowd, continue with several other back-end technical developments, keep working with existing language and technical partners and finding new ones, and try to find ways to pay for it all.
Many thanks for your continued support,
Martin
Links:
Project reports on GlobalGiving are posted directly to globalgiving.org by Project Leaders as they are completed, generally every 3-4 months. To protect the integrity of these documents, GlobalGiving does not alter them; therefore you may find some language or formatting issues.
If you donate to this project or have donated to this project, you can receive an email when this project posts a report. You can also subscribe for reports without donating.
