CAT领域工具和库合集

在线术语库

  1. 中国关键词:http://www.china.org.cn/chinese/china_key_words/
  2. 中国特色话语对外翻译标准化术语库:http://210.72.20.108/index/index.jsp
  3. 中国核心词汇:https://www.cnkeywords.net/index
  4. 中国思想文化术语:http://www.chinesethought.cn/TermBase.aspx
  5. 联合国术语库:https://unterm.un.org/UNTERM/portal/welcome
  6. 术语在线:http://termonline.cn/index.htm
  7. 国家教育研究院术语库:http://terms.naer.edu.tw/download/
  8. 区块链相关术语:http://8btc.com/thread-17286-16-1.html
  9. 明代职官中英辞典: https://escholarship.org/uc/item/2bz3v185
  10. 中国规范术语: http://shuyu.cnki.net/index.aspx
  11. Grand Dictionnaire Terminologique http://www.granddictionnaire.com/
  12. TERMIUM http://www.btb.termiumplus.gc.ca/tpv2alpha/alpha-eng.html?lang=eng
  13. 语帆术语宝:http://termbox.lingosail.com/
  14. 微软术语库:https://www.microsoft.com/zh-cn/language
  15. 世界卫生组织术语库:http://www.who.int/substance_abuse/terminology/zh/
  16. 电子工程术语表:https://www.maximintegrated.com/cn/glossary/definitions.mvp/terms/all
  17. Mdict 100GB超大离线词库下载:https://downloads.freemdict.com/
  18. 一本词典:http://www.onedict.com/
  19. 国家标准《物流术语》 http://zizhan.mot.gov.cn/zhuantizhuanlan/gonglujiaotong/shoufeigongluzmk/zhengcefagui/201508/t20150814_1863913.html
  20. 冬奥会术语查询网站:http://owgt.lingosail.com/
  21. 音乐术语查询:http://dictionary.t-classical.com/
  22. European Union Language and terminologyhttps://europa.eu/european-union/documents-publications/language-and-terminology_en
  23. IATE (Interactive Terminology for Europe) EU’s terminology databasehttps://iate.europa.eu/home
  24. 香港法律中英术语:https://www.elegislation.gov.hk/glossary/chi
  25. Magic Searchhttp://magicsearch.org
  26. Microsoft Language Portalhttps://www.microsoft.com/en-us/language
  27. Lingueehttps://www.linguee.com/
  28. The Free Dictionaryhttp://www.thefreedictionary.com/
  29. Glosbehttps://glosbe.com/tmem/

在线语料库(国内)

  1. 语料库:http://yulk.org/
  2. BCC语料库:http://bcc.blcu.edu.cn/
  3. 语料库在线:http://www.cncorpus.org/
  4. 北京大学中国语言学研究中心:http://ccl.pku.edu.cn/corpus.asp
  5. 北外语料库语言学:http://www.bfsu-corpus.org/
  6. 现代汉语平衡语料库:http://www.sinica.edu.tw/SinicaCorpus/
  7. 古汉语语料库:http://www.sinica.edu.tw/ftms-bin/ftmsw
  8. 近代汉语标记语料库:http://www.sinica.edu.tw/Early_Mandarin/
  9. 树图数据库:http://treebank.sinica.edu.tw/
  10. 搜文解字:http://words.sinica.edu.tw/
  11. 汉籍电子文献:http://www.sinica.edu.tw/~tdbproj/handy1/
  12. 中国传媒大学文本语料库检索系统:http://ling.cuc.edu.cn/RawPub/
  13. 哈工大信息检索研究室对外共享语料库资源:http://ir.hit.edu.cn/demo/ltp/Sharing_Plan.htm
  14. 香港教育学院语言资讯科学中心及其语料库实验室:http://www.livac.org/index.php?lang=sc
  15. 中文语言资源联盟:http://www.chineseldc.org/

在线语料库(国外)

  1. BNC——英国国家语料库(British National Corpus):http://www.natcorp.ox.ac.uk/
  2. BOE——柯林斯英语语料库(the Bank of English):http://www.collinslanguage.com/language-resources/dictionary-datasets/
  3. ANC——美国国家语料库(American National Corpus):http://www.anc.org/
  4. 兰开斯特汉语语料库 (LCMC)http://ota.oucs.ox.ac.uk/scripts/download.php?otaid=2474
  5. SKETCH ENGINE多语言语料库:www.sketchengine.co.uk
  6. BASE——英国学术口语语料库(British Academic Spoken English Corpus):http://www2.warwick.ac.uk/fac/soc/celte/research/base/
  7. Lextutorhttp://www.lextutor.ca/
  8. My Memoryhttps://mymemory.translated.net/
  9. TAUShttp://www.tausdata.org/index.php/language-search-engine
  10. TTMEMhttps://www.ttmem.com/terminology/download-translation-memory/
  11. TinyTMhttp://tinytm.sourceforge.net/
  12. DGT Translation Memoryhttps://magmatranslation.com/en/free-translation-memory/
  13. European Parliament Proceedings Parallel Corpus 1996-2011http://statmt.org/europarl/
  14. University of Maryland Parallel Corpus Project: The Biblehttp://users.umiacs.umd.edu/~resnik/parallel/bible.html
  15. Aligned Hansards of the 36th Parliament of Canadahttps://www.isi.edu/natural-language/download/hansard/
  16. EU Publication Officeshttps://publications.europa.eu/en/web/general-publications/publications
  17. Wikimedia Downloadshttps://dumps.wikimedia.org/backup-index.html
  18. Open Subtitleshttps://www.opensubtitles.org/en/search/subs
  19. United Nations Parallel Corpushttps://cms.unov.org/UNCorpus/
  20. European language pairshttp://www.statmt.org/wmt13/translation-task.html#download
  21. parallel corpus searchhttp://paralela.clarin-pl.eu/#
  22. UM-Corpus: A Large English-Chinese Parallel Corpushttp://nlp2ct.cis.umac.mo/um-corpus/um-corpus-license.html
  23. Clarin Parallel corporahttps://www.clarin.eu/resource-families/parallel-corpora
  24. The PKU 863 Chinese-English Parallel Corpushttps://www.lancaster.ac.uk/fass/projects/corpus/863parallel/
  25. 《红楼梦》汉英平行语料库:http://corpus.usx.edu.cn/hongloumeng/images/shiyongshuoming.htm
  26. 中央研究院近代汉语标记语料库:http://lingcorpus.iis.sinica.edu.tw/early/
  27. BYU corpora: https://corpus.byu.edu/

其他子语料库

  1. Books – A collection of translated literature
  2. DGT – A collection of EU Translation Memories provided by the JRC
  3. DOGC – Documents from the Catalan Goverment
  4. ECB – European Central Bank corpus
  5. EMEA – European Medicines Agency documents
  6. The EU bookshop corpus
  7. EUconst – The European constitution
  8. EUROPARL v7 – European Parliament Proceedings
  9. giga-fren – French-English Gigal-Word Corpus
  10. GNOME – GNOME localization files
  11. Global Voices – News stories in various languages
  12. The Croatian – English WaC corpus
  13. JRC-Acquis- legislative EU texts
  14. KDE4 – KDE4 localization files (v.2)
  15. KDEdoc – the KDE manual corpus
  16. MBS – Belgisch Staatsblad corpus
  17. memat – Xhosa/English parallel data
  18. MontenegrinSubs – Montenegrin movie subtitles
  19. MultiUN – Translated UN documents
  20. News Commentary, v9.0, v9.1
  21. OfisPublik – Breton – French parallel texts
  22. OO – the OpenOffice.org corpus
  23. OpenOffice.org 3 corpus
  24. OpenSubtitles – the opensubtitles.org corpus
  25. OpenSubtitles2011, OpenSubtitles2012, OpenSubtitles2013
  26. OpenSubtitles2016 – snapshot from 2016
  27. OpenSubtitles2018 – new complete version
  28. ParaCrawl corpus
  29. ParCor – A Parallel Pronoun-Coreference Corpus
  30. PHP – the PHP manual corpus
  31. Regeringsförklaringen – a tiny example corpus
  32. SETIMES – A parallel corpus of the Balkan languages
  33. SPC – Stockholm Parallel Corpora
  34. Tatoeba – A DB of translated sentences
  35. TedTalks hr-en
  36. TED Talks 2013
  37. Tanzil – A collection of Quran translations
  38. TEP – The Tehran English-Persian subtitle corpus
  39. Ubuntu – Ubuntu localization files
  40. UN – Translated UN documents
  41. Wikipedia – translated sentences from Wikipedia
  42. WikiSource – (small en-sv sample only
  43. WMT News Test Sets
  44. The Xhosa – English Navy corpus

主流CAT

  1. SDL Tradoshttps://www.sdltrados.cn/cn/products/trados-studio/free-trial.html
  2. Déjà Vuhttps://dejavux4.com/installers/DejaVuX3.Setup.exe
  3. MemoQhttps://www.memoq.com/downloads
  4. 雪人CAThttp://www.gcys.cn/
  5. OmegaThttp://omegat.org/download
  6. Acrosshttps://www.across.net/
  7. Transmatehttp://www.uedrive.com/
  8. WordFasthttp://www.wordfast.net/
  9. 雅信CAThttp://www.yxcat.com/
  10. Wordbeehttps://www.wordbee.com
  11. SmartCAThttps://www.smartcat.ai/
  12. MateCAThttps://www.matecat.com/

对齐工具

  1. WinAlignhttps://fix4dll.com/winalign_dll
  2. Abbyy Aligner: https://www.abbyy.com/en-eu/support/linguistic/aligner2/info/sr/
  3. TmxEditor: https://sourceforge.net/projects/tmxeditor/
  4. Okapi Olifant: http://okapi.sourceforge.net/downloads.html
  5. You Align: https://youalign.com/
  6. Transmate Aligner: http://5icat.cn/thread-4246-1-1.html
  7. BasicCAT Alignerhttps://www.basiccat.org/zh/new-tool-bitext-aligner/
  8. MemoQ LiveDocs:https://www.memoq.com/en/livedocs
  9. Super Alignhttp://sourceforge.net/projects/superalign
  10. hunalign (LGPL)http://mokk.bme.hu/resources/hunalign
  11. Europarl sentence aligner
  12. http://code.google.com/p/corpus-tools/downloads/list
  13. http://search.cpan.org/~achimru/Text-GaleChurch-1.00/lib/Text/GaleChurch.pm
  14. Gale & Church in Python: https://github.com/vchahun/galechurch
  15. Gargantuahttp://sourceforge.net/projects/gargantua/
  16. Melamed’s GMA (GPL)http://nlp.cs.nyu.edu/GMA/
  17. Bob Moore’s sentence aligner (Microsoft, licensehttp://research.microsoft.com/en-us/downloads/aafd5dcf-4dcc-49b2-8a22-f7055113e656/
  18. LF aligner (based on hunalign)
  19. http://sourceforge.net/projects/aligner/
  20. http://traduccionymundolibre.com/wiki/LF_Aligner
  21. Bleualign: https://github.com/rsennrich/bleualign
  22. malignahttp://sourceforge.net/projects/align/
  23. tca-alignhttp://freeterm.wordpress.com/2010/06/30/tca2-parallel-text-processing-at-uib-no/
  24. Champollion in scala: https://github.com/jhclark/akerblad
  25. sentence aligner from Uplughttp://sourceforge.net/projects/uplug/
  26. Movie subtitle alignmenthttp://opus.lingfil.uu.se/tools.php
  27. AlignFactroy:http://www.terminotix.com/index.asp?name=AlignFactory
  28. free on-line aligner at: http://www.youalign.com/
  29. Comparisons of alignment performance:
  30. http://www.ims.uni-stuttgart.de/~fraser/pubs/braune_coling2010.pdf
  31. http://lium3.univ-lemans.fr/mtmarathon2010/ProjectFinalPresentation/SentenceAlignment/sentence_alignment.pdf
  32. Tools for book alignment: http://search.cpan.org/~andrefs/
  33. Extract parallel sentences from comparable corpora: http://jgosme.perso.info.unicaen.fr/sentpair.html
  34. Accurat toolkit: http://www.accurat-project.eu/index.php?p=accurat-toolkit
  35. yalign: https://github.com/machinalis/yalign

Machine Translation Tools (statistical)

  1. Moseshttp://www.statmt.org/moses/
  2. SMT toolkithttp://www-i6.informatik.rwth-aachen.de/jane/
  3. cdec SMT decoder http://cdec-decoder.org
  4. NiuTrans http://www.nlplab.com/NiuPlan/NiuTrans.html
  5. sinhue:
  6. http://www.cs.helsinki.fi/u/mtkaaria/
  7. http://www.cs.helsinki.fi/u/mtkaaria/sinuhe/sinuhe_v1.3_rc2.1.tar.gz
  8. http://www.cs.helsinki.fi/u/mtkaaria/sinuhe/models/
  9. Syntax-augmented SMT (SAMT): http://www.cs.cmu.edu/~zollmann/samt/
  10. Docent: https://github.com/chardmeier/docent/wiki
  11. A decoder in Perl: http://staff.science.uva.nl/~christof/html/software.html
  12. Apertium: http://wiki.apertium.org/wiki/Main_Page
  13. Thot (GPL): http://thot.sourceforge.net/
  14. Mood/MISTRAL/Ramses (GPL): http://smtmood.sourceforge.net/about
  15. svn co https://smtmood.svn.sourceforge.net/svnroot/smtmood/trunk/mood
  16. Joshua: http://cs.jhu.edu/~ccb/joshua/
  17. Thrax: http://cs.jhu.edu/~jonny/thrax/
  18. Phramer: http://www.phramer.org/
  19. OpenMaTrEx: http://www.openmatrex.org/
  20. n-code (n-gram based SMT) [http://perso.limsi.fr/Individu/jmcrego/bincoder/
  21. Other interesting stuff:
  22. http://www.worldwidelexicon.org/api
  23. http://blog.worldwidelexicon.org/
  24. http://code.google.com/p/m4loc/
  25. Phrase extraction toolkit: http://code.google.com/p/geppetto/

MT Evaluation Tools

  1. NIST BLEU ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v11b.pl
  2. METOR: http://www.cs.cmu.edu/~alavie/METEOR/
  3. The Asiya Open Toolkit for Automatic MT (Meta-)Evaluation http://www.lsi.upc.edu/~nlp/Asiya/
  4. TER: http://www.umiacs.umd.edu/~snover/terp/
  5. http://sourceforge.net/projects/tercpp
  6. Different metrics & significance testing: https://github.com/jhclark/multeval
  7. Combining various metrics in a simple script: http://kheafield.com/code/scoring.tar.gz
  8. visualization: https://github.com/mjdenkowski/meteor/tree/master/xray

Other tools and links

  1. significance tests: http://projectile.sv.cmu.edu/research/public/tools/bootStrap/tutorial.htm
  2. interactive BLEU: http://http://code.google.com/p/ibleu/
  3. XML wrapper: http://kheafield.com/code/scoring.tar.gz
  4. Apertium: http://wiki.apertium.org/wiki/Main_Page
  5. convert bitexts to tmx: http://sourceforge.net/projects/bitext2tmx/

以上合集由Nansey整理,维护和更新。转载请注明来自nansey.me

发表评论

电子邮件地址不会被公开。 必填项已用*标注