拉勾网上争对反爬虫机制出现的问题
如何解决抓取信息时“{”状态“:false,“MSG”:“您操作过于频繁,请稍后再访问”,“clientip”:“182.129.38.91”,“state: 2408}”的问题?添加了所有请求条件,设置了动态UA,但仍然不能工作DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"cookie": "JSESSIONID=ABAAAECABIEACCA6B0B35CC82843AFC00F32AC6B45A76AE; WEBTJ-ID=20200520170246-172315226f3e9-0d022377fe6f86-366b4108-921600-172315226f6c9; RECOMMEND_TIP=true; _ga=GA1.2.1963165012.1589965367; _gid=GA1.2.1320686678.1589965367; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1589965367; user_trace_token=20200520170247-3f09971d-19df-4827-9bcf-74c99b5c9dcf; LGUID=20200520170247-da3d96d1-a445-42ae-a7f4-9374f320e5e4; index_location_city=%E5%85%A8%E5%9B%BD; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22172315288e8488-081e1b46a8d664-366b4108-921600-172315288eaac4%22%2C%22%24device_id%22%3A%22172315288e8488-081e1b46a8d664-366b4108-921600-172315288eaac4%22%7D; sajssdk_2015_cross_new_user=1; TG-TRACK-CODE=search_code; X_MIDDLE_TOKEN=fbb896af01a0adbc319581251f75b474; X_HTTP_TOKEN=8f6ad38bd41b517633076998513a00d575eaa5b241; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1589967033; LGRID=20200520173033-36acdbfa-3b82-4c5a-aa26-281603de6f11; SEARCH_ID=c6ae56ca6a0644a78264dbc3d61c1e4c",
"referer":"https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=",
"x-anit-forge-code": 0,
"x-anit-forge-token": None,
"x-requested-with": "XMLHttpRequest"
}
agent1='Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36'
agent2='E808 SAMSUNG-SGH-E808/1.0*MzU0MTk0MDAwNTgzMDgx UP.Browser/6.2.2.6 (GUI) MMP/1.0'
agent3='D500C SAMSUNG-SGH-D500C/1.0 Profile/MIDP-2.0 Configuration/CLD\
C-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2'
agent4='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 \
(KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362'
agent5=' E100A SAMSUNG-SGH-E100A/T2 UP.Browser/6.1.0.6 (GUI) MMP/1.0'
UserAgent=[
agent1,
agent2,
agent3,
agent4,
agent5
]
控制访问速度,稍微慢一点。或者IP代理池 你被屏蔽了。估计代理池正常。
页:
[1]