摘录


Fortunately Nltk provided stopwords corpus for english language, but the Chinese segment tool Jieba did not have stopwords corpus for Chinese. Usually the stopwords generally is a python list element, we can easily build our own stopwords corpus, and the BaiduGuide also provide some Chinese stop words for programmer. So the stopwords_zh project are presented after collected these words and save to a single file. You can download these file if you need to remove Chinese stopwords.

点评

NULL

原文

点击这里查看原文

其它

本帖内容由21QA云收藏工具自动生成,欢迎使用。

系统消息 若觉得内容不错,请点击左上角的"赞"图标,以优化网站的内容呈现。 另外,请及时验证注册邮箱,否则收不到21QA发出的红包。 官方Q群:250203055

asked 12 Sep, 17:01

%E8%B7%AF%E4%BA%BA%E7%94%B2's gravatar image

路人甲
131597709886

Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link: [text](http://url.com/ "title")
  • image: ![alt](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×885
×29

question asked: 12 Sep, 17:01

question was seen: 196 times

last updated: 12 Sep, 17:01

powered by O*S*Q*A

粤ICP备14040061号-1