NLP

By admin, 9 六月, 2017

这是Google扫描书本时生成的数据库:http://storage.googleapis.com/books/ngrams/books/datasetsv2.html

1-gram的数据库即单词的频率,例如:

circumvallate   1978   313    215   85
circumvallate   1979   183    147   77

The first line tells us that in 1978, the word "circumvallate" (which means "surround with a rampart or other fortification", in case you were wondering) occurred 313 times overall, on 215 distinct pages and in 85 distinct books from our sample.

标签