99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話:13121318867

登錄
首頁精彩閱讀Python文本處理2個(gè)小案例(文本嗅探與關(guān)鍵詞占比統(tǒng)計(jì))
Python文本處理2個(gè)小案例(文本嗅探與關(guān)鍵詞占比統(tǒng)計(jì))
2017-06-05
收藏

Python文本處理2個(gè)小案例(文本嗅探與關(guān)鍵詞占比統(tǒng)計(jì))

問題描述:有一些句子和一些關(guān)鍵詞,現(xiàn)在想找出包含至少一個(gè)關(guān)鍵詞的那些句子(文本嗅探),可以參考print('='*30)之前的代碼。如果想進(jìn)一步計(jì)算每個(gè)句子中的關(guān)鍵詞占比(句子中所有關(guān)鍵詞長(zhǎng)度之和/句子長(zhǎng)度),可以參考后面的代碼。關(guān)鍵詞占比是比較常用的一個(gè)文本分類標(biāo)準(zhǔn),如果想根據(jù)關(guān)鍵詞占比對(duì)句子進(jìn)行分類的話,可以自行補(bǔ)充代碼。

本文主要演示列表推導(dǎo)式、字符串對(duì)象用法以及生成器表達(dá)式和內(nèi)置函數(shù)的用法。
from random import choice
from string import ascii_letters

def check(sentences, words):
'''返回包含至少一個(gè)關(guān)鍵詞的句子列表'''
return [sentence \
for sentence in sentences\
if sum(sentence.count(word)\
for word in words)>0]
sentences = ['This is a test.',
'Beautiful is better than ugly.',
'Explicit is better than implicit.',
'Simple is better than complex.',
'Sparse is better than dense.',
'Readability counts.',
'Now is better than never.']
words = ['test', 'count', 'dense', 'is', 'simple']
result = check(sentences, words)
for item in result:
print(item)

print('='*30)
# 計(jì)算每個(gè)句子中所有關(guān)鍵字總長(zhǎng)度的占比
d = {sentence:round(sum(sentence.count(word)*len(word)\
for word in words)/len(sentence),3)\
for sentence in result}
for item in d.items():
print(item)
cda數(shù)據(jù)分析師培訓(xùn)
運(yùn)行結(jié)果:
This is a test.
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Sparse is better than dense.
Readability counts.
Now is better than never.
==============================
('This is a test.', 0.533)
('Beautiful is better than ugly.', 0.067)
('Explicit is better than implicit.', 0.061)
('Simple is better than complex.', 0.067)
('Sparse is better than dense.', 0.25)
('Readability counts.', 0.263)
('Now is better than never.', 0.08)


數(shù)據(jù)分析咨詢請(qǐng)掃描二維碼

若不方便掃碼,搜微信號(hào):CDAshujufenxi

數(shù)據(jù)分析師資訊
更多

OK
客服在線
立即咨詢
客服在線
立即咨詢
') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機(jī)數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進(jìn)行初始化 // 參數(shù)1:配置參數(shù) // 參數(shù)2:回調(diào),回調(diào)的第一個(gè)參數(shù)驗(yàn)證碼對(duì)象,之后可以使用它調(diào)用相應(yīng)的接口 initGeetest({ // 以下 4 個(gè)配置參數(shù)為必須,不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺(tái)檢測(cè)極驗(yàn)服務(wù)器是否宕機(jī) new_captcha: data.new_captcha, // 用于宕機(jī)時(shí)表示是新驗(yàn)證碼的宕機(jī) product: "float", // 產(chǎn)品形式,包括:float,popup width: "280px", https: true // 更多配置參數(shù)說明請(qǐng)參見:http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計(jì)時(shí)完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請(qǐng)輸入'+oInput.attr('placeholder')+'!'; var errTxt = '請(qǐng)輸入正確的'+oInput.attr('placeholder')+'!'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }