Apply to Join

Together for Burundi!

by Kamusi Project USA
Together for Burundi!

The biggest Kamusi news of the quarter is that we have received a grant, from a foundation that wishes to remain anonymous, to launch a project called "Digital Yiddish". This will fund about 20% of our operating costs for the next three years - so we are still scrambling to finance other languages, but at least we know we'll be able to make headway in one direction.

We are also about to launch service in French. The data is imperfect, but we've decided that it is better to make it public and improve it as we go, rather than keep it offline until it is great, because not having French has been preventing too many people from using our resources in Africa and Europe.

A few projects in the lab look like they should launch in the nearest future, but I've learned not to predict release dates. As teasers:
• A picture is worth a thousand words
• WeChat is used by 900 million people in China. Kamusi has 100,000 terms in Chinese. WeChat supports bot technology that is similar to what we have launched on Facebook. !

I'll leave the news brief this quarter, and invite you to see what's going on behind the scenes by looking at our whiteboard: http://bit.ly/kamusilabs

Happy Year of the Dog!
Martin

Links:

The highlight of this quarter has been the v1.0 completion of our bot on Facebook. This makes Kamusi the "smallest biggest dictionary" - smallest because students can access it with the least possible effort at the least possible cost, and biggest because we've now got the most precise links in a matrix of 43 languages and counting.

To use the new service, just go to Facebook Messenger and send a message to kamusiproject, as you would send any other chat. A message such as go/spanish/zulu/coche will set your languages and search for your word, whereas simply sending a message such as "coche" will look for that word using your previous settings. No bookstore, no library, no website, not even an app - you just type your word in Facebook, and presto, full info! We have students in Kamusi Labs who are working on porting the bot to several other platforms over the coming months.

In the last quarterly report, I promised we'd have 18 languages from India online "very soon". Promise fulfilled! Before we make a big deal about this, though, we are working to complete a unique universal transliteration system among alphabets, because it isn't much use to know that "coche" is in Malayalam if you can't sound out the letters of that script. Indian languages are written in many different scripts, so transliteration is really a key to making a socially useful dictionary for the sub-continent. As is common with Kamusi Labs projects, the reason a universal transliteration system hasn't been tackled before is that its relentlessly complexity is too insane to even contemplate. Look for our first implementation very soon.

Another exciting recent development has been the spontaneous emergence of a vibrant group of young users for the Fon language of Benin. Unfortunately, the group is using the WhatsApp messaging platform, which does not support bots, so we have to transfer their enthusiasm to Facebook when we've added data collection features to the current bot. This could happen soon, or could drag out for a while, This group will be a model for many other languages. Right now we are focused on expanding in the West Africa region, and then hopefully we can bring the model back to Burundi and the Swahili zone.

I'll look forward to seeing which of the fun things coming down the pike I can tell you about next time. Meanwhile, I'll share this piece of fan mail, which I think gives some insight about our persistent difficulties in attracting funding:

Subject: Regarding Kamusi
I have seen your project Kamusi Gold. I am just wondering about this.
It's mentioned that there are 7000 languages spoken and your vision is to bring most of the content online.
I personally feel like most of the languages should die fast because lots of things can be made easier. The languages issues like working from different cultures, trade related issues etc., will be gone.
Thanks and regards, MS
Mobile knowledge for 5 South African languages
Mobile knowledge for 5 South African languages

2017 is shaping up as the watershed year in which many of the claims that Kamusi has been making about our potential to document "every word in every language" become demonstrable. While the goal will always remain unreachable, our recent activities show energetic progress in moving toward it. We have released tools for the public that already provide much better vocabulary translation than Google Translate among included languages. Our open search is on the path for around 60 languages yielding 3500 highly accurate bilingual dictionaries by the end of the year. At lesser accuracy, we are importing data from about 7000 languages, integrated with tools for users to participate in precision alignment; some of the data sets have hundreds of thousands of terms, while a few thousand languages have only a smattering. By year's end we will have as many as 30 million terms, and several free new tools for sharing that knowledge.

In terms of data, our latest news is the inclusion of 5 languages from South Africa within the system. 18 languages from India will be online very soon.

Our progress is due to a new approach toward moving the project. After years of fruitless efforts to find funds to pay for basic development, we decided to finish proving the concept first, and find funding later. (In industry, this is called reaching "minimal viable product", while in non-profit management it is called "insanity".) Kamusi Labs is now an international "virtual" laboratory for computational linguistics. Graduate and undergraduate students from near and far join the project for summer or term-time internships. The students gain experience, satisfy credit obligations from their home universities, and see immediate results from their work. This summer we have 20 students from nine countries, with slots filling rapidly for the autumn.

Several of the students are working on core development (language data, database design, input and output systems), but many are pushing forward on new elements that we could not undertake if we first sank months into seeking grants to pay for them.

  • The transliterator will convert phonetically among dozens of written alphabets, solving a fundamental communications problem for places like India by enabling a speaker of, for example, Hindi, to plausibly read text in, say, Tamil, without needing to recognize the characters from that other script. The transliterator will be built into dictionary search results and also made available as a stand-alone web app for users to convert free-form text.
  • "EatUp" will soon be the app that takes the guesswork out of ordering in a foreign restaurant. This solves a huge problem in Europe, where people speak dozens of languages and travel frequently, but menus only have space for one or two languages. Google Translate is hopeless in this domain, while Kamusi's approach should enable diners to confidently order the food they want.
  • "Pre-D" is our system to reliably determine the correct vocabulary for sophisticated machine translation via user-managed source-side pre-disambiguation, including a comprehensive new approach to multiword expressions. This is more complicated than can be described in this space, but will make more sense when the prototype is online, and is aimed to eventually result in significantly improved translation among numerous languages.
  • We are upgrading KamusiTERMS, a system we designed for the African Network for Localization for participatory community terminology development across languages and domains. The new TERMS will open the millions of domain-specific terms produced for the European Union to systematic extension to non-European languages, with a special target of terms that can improve students' scores in their technical courses.
  • Language wheels are a new graphic method we have devised for identifying and selecting languages on websites and software, providing each known language variety with a distinctive multi-colored icon. We will propose the wheels as an international standard within the ISO after a comment period from language specialist communities.
  • "EmojiWorldBot" was introduced last year as a dictionary on the Telegram chat platform between Emojis and 72 languages. Now we are greatly expanding the features so that it can be a full-fledged dictionary using all our multilingual data. The bot is now being integrated with Facebook Messenger. WeChat development is planned to start in August, to extend the bot to the Chinese market.
  • We are developing a variety of games to elicit language data from members of the public, via the web and mobile apps. We will open the first games for play when we resolve some complex networking and data coordination issues.

Most of our recent focus has been more technical than linguistic. As the bolts are tightened on our tools for collecting, managing, and sharing data, we anticipate concerted activities to enhance the resources for individual languages. Several of these are in the pipeline for autumn, with computer science students tasked to work with data sets and language communities for their mother tongues. Opportunities are legion to use our platform for the development of any particular language. We especially invite linguistically-oriented faculty to have their students talk with Kamusi Labs about interesting projects for their thesis research.

None of this activity can continue indefinitely without funding, of course. One intern is working on a platform for supporters to sponsor individual words, and we are exploring other ways to generate revenue within our non-profit mission. We are also continuing to pursue grants. Hopefully, funding will be easier to come y now that we have exciting results to show, but so far we have not located any government agency or philanthropy that finds language infrastructure worthwhile. If you have any ideas or contacts, please let us know.

Meanwhile, we will keep pressing on. We invite you to use our free services:

Multilingual Dictionaries:

Swahili Dictionary:

72-language Emoji Dictionary:

  •  http://telegram.me/emojiworldbot

Which checkers have made the most interesting progress up the Kamusi board in the past few months?

The introduction of our mobile app on iPhone is a strong contender. We've spent a lot of time getting the technology right for both iPhone and Android. Now we've got something that anyone can install on almost any phone (with a known Android 4.X issue, fixed in next week's update), that clearly gives better vocabulary results than Google Translate in head-to-head comparisons of the languages we both cover - watch the video! Now we are adding new features, and getting ready to float a raft of new languages. For the moment, we're keeping the rollout low key, but look for a lot of hoopla in September, when it all comes together in a back-to-school package that will be free to students around the world.

Please download your copy now, and make sure to give us high ratings in the iTunes and Google Play stores!
iPhone: https://is.gd/PDXJ15
Android: https://is.gd/IyODZl

Now we are embarking on a new project that I hope will make a significant contribution to preserving endangered languages around the world. You've probably come across articles about endangered languages, usually playing the same sad song about the last few grandparents in a remote village, talking into the tape recorder in a race to save a few words for posterity. Several thousand languages are in peril of disappearing in coming decades, in some cases because of active "linguicide" (for example, children being punished in school for speaking their mother tongue), and in other cases because younger generations take on other languages when they seek work away from their birth communities.

In many cases these days, grandchildren are realizing what they are about to lose, and "revitalization" efforts are underway from the Arctic to Australia. I've been to several meetings in the past couple of years, including one in March, where I've met with leaders from these communities, and learned about their struggles and their needs. I've also talked with a lot of field linguists with experience documenting small languages, and people developing language technology. Based on what these groups have reported, we are now putting together a package that can put Kamusi to the service of preserving and revitalizing endangered languages. 

We're calling it "Box-o-Lex", unless you can help us with a better name.

Box-o-Lex is designed to solve a number of problems that prevent languages from being documented. Researchers around the world have the desire to sit with an endangered language community and help them record their words on paper and audio. However, each researcher or community activist has to figure out their own software setup, what terms they are going to collect, what tools they will use to collect them, and what to do with the data once they've got it. By the time they've gotten the process figured out, they've lost the time to sit with the speakers and do the actual language work. We are designing Box-o-Lex as a kit that researchers can take to the field, flip on their phone, and start collecting words. Then they come back to the grid, connect to a network, and their results are immediately uploaded and on a server for everyone to use, fully integrated with all the other languages within the Kamusi framework.

The intent is to shave months of labor off each individual documentation project. In this way, it will become much more viable for a linguistics Masters student to document an endangered language in the short research window they have between their first and their second year, or for a community organization to start working with their elders without having to first master complex technological and linguistic issues. There are lots of students looking for good research projects - for example, many of our partners are African universities in countries with dozens of languages, and the faculty would eagerly send their students to document them if the tasks were well defined. By making the process easy, we have a clear path toward helping preserve many languages - instead of a dirge, a happy jig for didgeridoos and thumb pianos. 

Beyond Box-o-Lex, we have students working on about 15 different projects during the next several months. That's a lot of checkers moving along the board, so next quarter's report should be about something completely different. Language wheels? Universal transliteration? Bots? Stay tuned...

Thanks for your continued support,
Martin

Links:

Kamusi Here! for Android
Kamusi Here! for Android

The visible highlight of this past quarter was the release of our mobile app for Android. The initial launch has "only" eleven languages, and a limited feature set, but it is a prime demonstration of what is going on behind the scenes. When it comes to vocabulary selection between any two non-English languages in the 1.0 release, the app demonstrably beats the socks off Google Translate.

You can try this at home: install our app, install the Google Translate app, and run them head to head for a pair where you can probably make some sense of the results. Romanian to Catalan for the word "casa" or "iute" are good examples, even if you speak neither language. You can download the app at the Google Play store, linked from http://kamusigold.org/info/here_android

iOS users, patience please, we are slated to have a version for you by late March.

This is a demonstration of the technology, but the heart of our work is the data - particularly the data for African and other excluded languages. We are waiting for a partner in South Africa to put the finishing touches on 5 languages that we will be able to bring right into the current system, possibly this quarter, along with 18 languages from India that are ready save one technical index file. The first portion of our Kinyarwanda data is also ready to import; Kinyarwanda is mutually intelligible with Kirundi, where we've been stuck because we haven't raised enough funds yet to reactivate the student stipends, so meanwhile we've taken an alternative path toward getting a working resource onto devices that can be used in both Burundi and Rwanda.

We also prepared a large and complicated data set for the Fula language, spoken by 25 million people from Mauritania to Cameroon. This data now needs to be aligned with our main concept set using our "DUCKS" system, along with about 65 other languages beyond the first 60 for which we have a pre-existing index. The 126 languages in various stages on the path to inclusion are shown at http://kamusigold.org/info/gold_languages_info, including a fair number of African and other non-lucrative languages.

I can already predict next quarter's update: more data for more languages, on a better technological backbone. Thanks for your continued support in pushing us on that path.

Happy 2017,
Martin

Links:

 

About Project Reports

Project Reports on GlobalGiving are posted directly to globalgiving.org by Project Leaders as they are completed, generally every 3-4 months. To protect the integrity of these documents, GlobalGiving does not alter them; therefore you may find some language or formatting issues.

If you donate to this project or have donated to this project, you will get an e-mail when this project posts a report. You can also subscribe for reports via e-mail without donating.

Get Reports via Email

We'll only email you new reports and updates about this project.

Organization Information

Kamusi Project USA

Location: Brooklyn, NY - USA
Website:
Facebook: Facebook Page
Twitter: @kamusi
Project Leader:
Martin Benjamin
Brooklyn, NY United States
$4,668 raised of $14,020 goal
 
160 donations
$9,352 to go
Donate Now
lock
Donating through GlobalGiving is safe, secure, and easy with many payment options to choose from. View other ways to donate

Kamusi Project USA has earned this recognition on GlobalGiving:
Add Project to Favorites

Help raise money!

Support this important cause by creating a personalized fundraising page.

Start a Fundraiser

Learn more about GlobalGiving

Teenage Science Students
Vetting +
Due Diligence

Snorkeler
Our
Impact

Woman Holding a Gift Card
Give
Gift Cards

Young Girl with a Bicycle
GlobalGiving
Guarantee

Sign up for the GlobalGiving Newsletter

Donate
WARNING: Javascript is currently disabled or is not available in your browser. GlobalGiving makes extensive use of Javascript and will not function properly with Javascript disabled. Please enable Javascript and refresh this page.