More Publications (24)
-
1An Empirical Study on Web Mining of Parallel DataBeijing, August 2010 An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1 , Chi-Ho Li 2 , Ming Zhou 2 and Hae-Chang...
-
2Harvesting Parallel Text in Multiple Languages with Limited Supervisionwebpages, we extend our harvesting technique to create parallel data for resource limited languages in an unsupervised manner.../09/06/september-2011-web-server-survey.html 201 language pairs. As a result, web harvesting of parallel text has been...
-
3JMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool2012 - Max Kaufmanncorpora for certain language pairs, such as Spanish or French, are widely available, but for many language pairs, such as Bengali...
-
4Automatic Bilingual Phrase Extraction from Comparable Corporausing English-German, English-Greek and English-Latvian language pairs. The performance of our classifier on the test sets is...
-
5Machine Transliteration: Leveraging on Third Languagesresource requirement of training corpus for low-density language pairs while the model-based pivot strategy performs worse than...
-
6Cross-Lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Languageutilizing bilingual dictionaries, Penn Discourse Treebank and parallel data between English and Chinese. We start from translating...
-
7Sub-corpora Sampling with an Application to Bilingual Lexicon Extraction2012 - Ivan Vulić,Marie-Francine Moensinduction, particularly in a setting where only limited parallel data are available. Word translation pairs are extracted from...corpora either do not exist or are of limited size for most language pairs. Therefore the focus of the researchers has turned towards...
-
8Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation2010 - Dmitriy Genzelmachine translation. We learn rules for 8 different language pairs, showing BLEU improvements for all of them, and demonstrate...
-
9Efficient Discrimination Between Closely Related Languages2012 - Jörg Tiedemann,Nikola Ljubešićobserve the amount of symmetry in the token overlap inside language pairs. The largest difference in the token overlap is shown...identification relies on the prerequisite of possessing parallel data of closely related languages. Since parallel texts communicate...
-
10Inducing Crosslingual Distributed Representations of Wordsis derived from co-occurrence statistics in bilingual parallel data. These representations can be used for a number of cr...
-
11Local lexical adaptation in Machine Translation through triangulation: SMT helping SMTlanguages which can be translated by systems for different language pairs and whose outputs can be successfully combined into better...
-
12A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MTreordered alternatives? Do these improvements persist across language pairs that exhibit significantly different reodering effects...
-
13Large Scale Parallel Document Mining for Machine Translationprovides an abundance of readily available monolingual text, parallel data is still a comparatively scarce resource, yet plays a...
-
14Determining Recurrent Sound Correspondences by Inducing Translation Models2002 - Grzegorz KondrakEnglish– German and the French–Latin pairs, all remaining language pairs are quite challenging for a cognate identification program...
-
15Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tableslanguage models: the first on the English side of the parallel data, and the second on the English Gigaword corpus. Our...
-
16A Power Mean Based Algorithm for Combining Multiple Alignment TablesUrdu. We trained such alignments using using GIZA++ on parallel data with partial words for Pashto sentences. The fourth type...
-
17Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translationobtain sufficient amounts of in-domain data (in particular parallel data required for translation and distortion models) to train...Environment (env) and Labour Legislation (lab) and two language pairs: English–French and English–Greek (both directions) and...
-
18Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Modelsthe English–French (En–Fr) and English–German (En–De) language pairs. We use three different freely available parallel corpora...of having poor word alignments due to small amounts of parallel data in every iteration. Word-alignments are known to benefit...
-
19Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing2010 - Martin Cmejrek,Bowen Zhouhierarchical phrase based MT systems directly from the parallel data, independently of bilingual word alignments. Let us have...the behavior of the rule arithmetic on two different language pairs: German-English and Farsi-English. We also propose an...
-
20Mining New Word Translations from Comparable Corpora2004 - Li Shao,Hwee Tou Ngcorpora are scarce resources, especially for uncommon language pairs. Comparable corpora refer to texts that are not direct...
-
21Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translationeffective in capturing the long-range reordering between language pairs with very different word orders like Japanese-English...automatically learned from the word-aligned source-parsed parallel data and incorporated as a tree-to-string grammar for decoding...
-
22Notes on the Evaluation of Dependency Parsers Obtained Through Cross-Lingual Projection2010 - Kathrin Spreyercorpus (Koehn, 2005) as our parallel corpus. It comprises parallel data from 11 languages; in this paper, we present experiments...Dutch and Italian as TLs. First, the bitexts for the language pairs under consideration (English-Dutch, English-Italian,...
-
23Extraction of Domain-Specific Bilingual Lexicon from Comparable Corpora: Compositional Translation and Rankingcandidate translation. The method was tested on two language pairs (English-French and English-German) and with a small... domain-specific bilingual lexicons when there is no parallel data available (e.g. translation memories, multilingual t...
-
24Approximate Sentence Retrieval for Scalable and Efficient Example-Based Machine Translationmachine translation (MT) mostly depends on the size of parallel data available for training. Although statistical MT is considered...data sets used for all our experiments represent two language pairs with parallel data of different size and type. Statistics...