8888四色奇米在线观看,黄色成人免费网站,一个人免费视频观看

zxq997

2018-10-18 閱讀量: 1016

python爬蟲(chóng)常見(jiàn)異常

在編寫(xiě)python爬蟲(chóng)時(shí)經(jīng)常會(huì)遇到異常中斷的情況，導(dǎo)致爬蟲(chóng)意外終止，一個(gè)理想的爬蟲(chóng)應(yīng)該能夠在遇到這些異常時(shí)繼續(xù)運(yùn)行。下面就談?wù)勥@幾種常見(jiàn)異常及其處理方法：

異常1：requests.exceptions.ProxyError

對(duì)于這個(gè)錯(cuò)誤，stackoverflow給出的解釋是

The ProxyError exception is not actually the requests.exceptions exception; it an exception with the same name from the embedded urllib3 library, and it is wrapped in a MaxRetryError exception.

翻譯過(guò)來(lái)就是這個(gè)錯(cuò)誤實(shí)際上不是requests.exceptions中的異常，這是嵌入到urllib2庫(kù)中的同名異常，這個(gè)異常是封裝在MaxRetryError當(dāng)中的。補(bǔ)充一點(diǎn)，通常在代理服務(wù)器不通時(shí)出現(xiàn)這個(gè)異常。

異常2：requests.exceptions.ConnectionError

對(duì)于這個(gè)錯(cuò)誤，stackoverflow給出的解釋是

In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.

翻譯過(guò)來(lái)就是說(shuō)這是網(wǎng)絡(luò)問(wèn)題出現(xiàn)的異常事件（如DNS錯(cuò)誤，拒絕連接，等等），這是Requests庫(kù)中自帶的異常

一種解決辦法是捕捉基類(lèi)異常，這種方法可以處理所有的異常情況:

try:

r = requests.get(url, params={’s’: thing})

except requests.exceptions.RequestException as e: # This is the correct syntax

print e

sys.exit(1)

另外一種解決辦法是分別處理各種異常，這里面有三種異常：

try:

r = requests.get(url, params={’s’: thing})

except requests.exceptions.Timeout:

except requests.exceptions.TooManyRedirects:

except requests.exceptions.RequestException as e:

print e

sys.exit(1)

異常3：requests.exceptions.ChunkedEncodingError

對(duì)于這個(gè)錯(cuò)誤，stackoverflow給出的解釋是

The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links.

問(wèn)題中給出的鏈接是執(zhí)行urllib’s庫(kù)的read函數(shù)時(shí)，捕捉到了讀取不完整數(shù)據(jù)導(dǎo)致的異常。如果你不想實(shí)現(xiàn)這個(gè)完整的不動(dòng)，只要在讀取你的鏈接時(shí)拋出一個(gè)try/catch循環(huán)即可：

try:

page = urllib2.urlopen(urls).read()

except httplib.IncompleteRead, e:

page = e.partial