From Amharic to Xhosa, introducing Translate in 13 new languages — now over 100 in total!
February 17th, 2016 | Published in Google Translate
In 2006, we started with machine learning-based translations between English and Arabic, Chinese and Russian. Almost 10 years later, with today’s update, we now offer 103 languages that cover 99% of the online population.
The 13 new languages — Amharic, Corsican, Frisian, Kyrgyz, Hawaiian, Kurdish (Kurmanji), Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto and Xhosa — help bring a combined 120 million new people to the billions who can already communicate with Translate all over the world.
So what goes into adding a new language? Beyond the basic criteria that it must be a written language, we also need a significant amount of translations in the new language to be available on the web. From there, we use a combination of machine learning, licensed content and Translate Community.
As we scan the Web for billions of already translated texts, we use machine learning to identify statistical patterns at enormous scale, so our machines can "learn" the language. But, as already existing documents can’t cover the breadth of a language, we also rely on people like you in Translate Community to help improve current Google Translate languages and add new ones, like Frisian and Kyrgyz. So far, over 3 million people have contributed approximately 200 million translated words.
Before you dive into translating, here are a few fun facts about the new languages:
For each new language, we make our translations better over time, both by improving our algorithms and systems and by learning from your translations with Translate Community. Today's update will be rolling out over the coming days.
No matter what language you speak, we hope today’s update makes it easier to communicate with millions of new friends and break language barriers one conversation at a time.
The 13 new languages — Amharic, Corsican, Frisian, Kyrgyz, Hawaiian, Kurdish (Kurmanji), Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto and Xhosa — help bring a combined 120 million new people to the billions who can already communicate with Translate all over the world.
So what goes into adding a new language? Beyond the basic criteria that it must be a written language, we also need a significant amount of translations in the new language to be available on the web. From there, we use a combination of machine learning, licensed content and Translate Community.
As we scan the Web for billions of already translated texts, we use machine learning to identify statistical patterns at enormous scale, so our machines can "learn" the language. But, as already existing documents can’t cover the breadth of a language, we also rely on people like you in Translate Community to help improve current Google Translate languages and add new ones, like Frisian and Kyrgyz. So far, over 3 million people have contributed approximately 200 million translated words.
- Amharic (Ethiopia) is the second most widely spoken Semitic language after Arabic
- Corsican (Island of Corsica, France) is closely related to Italian and was Napoleon's first language
- Frisian (Netherlands and Germany) is the native language of over half the inhabitants of the Friesland province of the Netherlands
- Kyrgyz (Kyrgyzstan) is the language of the Epic of Manas, which is 20x longer than the Iliad and the Odyssey put together
- Hawaiian (Hawaii) has lent several words to the English language, such as ukulele and wiki
- Kurdish (Kurmanji) (Turkey, Iraq, Iran and Syria) is written with Latin letters while the others two varieties of Kurdish are written with Arabic script
- Luxembourgish (Luxembourg) completes the list of official EU languages Translate covers
- Samoan (Samoa and American Samoa) is written using only 14 letters
- Scots Gaelic (Scottish highlands, UK) was introduced by Irish settlers in the 4th century AD
- Shona (Zimbabwe) is the most widely spoken of the hundreds of languages in the Bantu family
- Sindhi (Pakistan and India) was the native language of Muhammad Ali Jinnah, the "Father of the Nation” of Pakistan
- Pashto (Afghanistan and Pakistan) is written in Perso-Arabic script with an additional 12 letters, for a total of 44
- Xhosa (South Africa) is the second most common native language in the country after Afrikaans and features three kinds of clicks, represented by the letters x, q and c
For each new language, we make our translations better over time, both by improving our algorithms and systems and by learning from your translations with Translate Community. Today's update will be rolling out over the coming days.
No matter what language you speak, we hope today’s update makes it easier to communicate with millions of new friends and break language barriers one conversation at a time.