python中urllib.request和requests的使用及區別詳解

←手機掃碼閱讀 retouched @ 2020-05-06 , reply:0

urllib.request
我們都知道，urlopen()方法能發起最基本對的請求發起，但僅僅這些在我們的實際應用中一般都是不夠的，可能我們需要加入headers之類的參數,那需要用功能更為強大的Request類來構建了
在不需要任何其他參數配置的時候，可直接通過urlopen()方法來發起一個簡單的web請求
發起一個簡單的請求
import urllib.request url='https://www.douban.com' webPage=urllib.request.urlopen(url) print(webPage) data=webPage.read() print(data) print(data.decode('utf-8'))
urlopen()方法返回的是一個http.client.HTTPResponse對象，需要通過read（）方法做進一步的處理。一般使用read（）後，我們需要用decode（）進行解碼，通常為utf-8，經過這些步驟後，最終才獲取到我們想要的網頁。
添加Headers信息
import urllib.request url='https://www.douban.com' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36', } response=urllib.request.Request(url=url,headers=headers) webPage=urllib.request.urlopen(response) print(webPage.read().decode('utf-8'))
使用Request類返回的又是一個urllib.request.Request對象了。
通常我們爬取網頁，在構造http請求的時候，都需要加上一些額外信息，什麼Useragent，cookie等之類的信息，或者添加代理服務器。往往這些都是一些必要的反爬機制
requests
通常而言，在我們使用python爬蟲時，更建議用requests庫，因為requests比urllib更為便捷，requests可以直接構造get,post請求併發起，而urllib.request只能先構造get，post請求，再發起。
import requests url='https://www.douban.com' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36', } get_response = requests.get(url,headers=headers,params=None) post_response=requests.post(url,headers=headers,data=None,json=None) print(post_response) print(get_response.text) print(get_response.content) print(get_response.json)
get_response.text得到的是str數據類型。
get_response.content得到的是Bytes類型,需要進行解碼。作用和get_response.text類似。
get_response.json得到的是json數據。
總而言之，requests是對urllib的進一步封裝，因此在使用上顯得更加的便捷，建議小夥伴們在實際應用當中儘量使用requests。
補充知識：python中urllib.request.Request()與urllib.request.urlopen()區別
蟒蛇中urllib.request.Request（）與urllib.request.urlopen（）的區別：
相對於urllib.request.urlopen（）來說urllib.request.Request是進一步的包裝請求，下面是請求類的源碼示例：
class Request: # 主要看這塊，構造函數中指明瞭Request進一步包裝請求中可以傳遞的參數有（url，data，headers， # origin_req_host，unverifiable，method） def __init__(self, url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None): self.full_url = url self.headers = {} self.unredirected_hdrs = {} self._data = None self.data = data self._tunnel_host = None for key, value in headers.items(): self.add_header(key, value) if origin_req_host is None: origin_req_host = request_host(self) self.origin_req_host = origin_req_host self.unverifiable = unverifiable if method: self.method = method pass
我們可以這樣使用（以下是模擬有道字典翻譯發送的請求）：
# 請求地址url url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule" # 請求頭 request_headers = { 'Host':'fanyi.youdao.com', "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36", } # 發送給服務器的表單 form_data = { "i": word, "from": "AUTO", "to": "AUTO", "smartresult": "dict", "doctype": "json", "version": "2.1", "keyfrom": "fanyi.web", "action": "FY_BY_REALTIME", "typoResult": "false" } # POST發送的data必須為bytes或bytes類型的可迭代對象，不能是字符串 form_data = urllib.parse.urlencode(form_data).encode() # 構造請求對象Request req = urllib.request.Request(url, data=form_data, headers=request_headers) # 發起請求 response = urllib.request.urlopen(req) data = response.read().decode() print(data)
所以，總的來說，如果我們在獲取請求對象時，不需要過多的參數傳遞，我麼可以直接選擇urllib.request.urlopen（）;如果需要進一步的包裝請求，則需要用urllib.request裡。的urlopen（）進行包裝處理。

Tags:

[retouched ] python中urllib.request和requests的使用及區別詳解已經有266次圍觀

本文地址：http://coctec.com/docs/python/shhow-post-233055.html

python中urllib.request和requests的使用及區別詳解

熱門文章

最新文章