Python爬蟲的第二種姿勢,Selenium框架案例講解
作者:Python可樂
本篇給大家詳解Python爬蟲的第二種姿勢,Selenium框架案例的相關內容,希望對你有所幫助!
selenium使用流程:
1.環境安裝:
- pip install selenium
2.下載一個瀏覽器的驅動程序(谷歌瀏覽器)
3.實例化一個瀏覽器對象基本使用
代碼
- from selenium import webdriver
- from lxml import etree
- from time import sleep
- if __name__ == '__main__':
- bro = webdriver.Chrome(r"E:\google\Chrome\Application\chromedriver.exe")
- bro.get(url='http://scxk.nmpa.gov.cn:81/xk/')
- page_text = bro.page_source
- tree = etree.HTML(page_text)
- li_list = tree.xpath('//*[@id="gzlist"]/li')
- for li in li_list:
- name = li.xpath('./dl/@title')[0]
- print(name)
- sleep(5)
- bro.quit()
基于瀏覽器自動化的操作代碼
#編寫基于瀏覽器自動化的操作代碼
- 發起請求: get(url)
- 標簽定位: find系列的方法
- 標簽交互: send_ keys( 'xxx' )
- 執行js程序: excute_script('jsCod')
- 前進,后退: back(),forward( )
- 關閉瀏覽器: quit()1
代碼
- https://www.taobao.com/
- from selenium import webdriver
- from time import sleep
- bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
- bro.get(url='https://www.taobao.com/')
- #標簽定位
- search_input = bro.find_element_by_id('q')
- sleep(2)
- #執行一組js代碼,使得滾輪向下滑動
- bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')
- sleep(2)
- #標簽交互
- search_input.send_keys('女裝')
- button = bro.find_element_by_class_name('btn-search')
- button.click()
- bro.get('https://www.baidu.com')
- sleep(2)
- bro.back()
- sleep(2)
- bro.forward()
- sleep(5)
- bro.quit()
selenium處理iframe:
- 如果定位的標簽存在于iframe標簽之中,則必須使用switch_to.frame(id)
- 動作鏈(拖動) : from selenium. webdriver import ActionChains
- 實例化一個動作鏈對象: action = ActionChains (bro)
- click_and_hold(div) :長按且點擊操作
- move_by_offset(x,y)
- perform( )讓動作鏈立即執行
- action.release( )釋放動作鏈對象
代碼
- https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable
- from selenium import webdriver
- from time import sleep
- from selenium.webdriver import ActionChains
- bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
- bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
- bro.switch_to.frame('iframeResult')
- div = bro.find_element_by_id('draggable')
- #動作鏈
- action = ActionChains(bro)
- action.click_and_hold(div)
- for i in range(5):
- action.move_by_offset(17,0).perform()
- sleep(0.3)
- #釋放動作鏈
- action.release()
- bro.quit()
selenium模擬登陸QQ空間
代碼
- https://qzone.qq.com/
- from selenium import webdriver
- from time import sleep
- bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
- bro.get('https://qzone.qq.com/')
- bro.switch_to.frame("login_frame")
- switcher = bro.find_element_by_id('switcher_plogin')
- switcher.click()
- user_tag = bro.find_element_by_id('u')
- password_tag = bro.find_element_by_id('p')
- user_tag.send_keys('1234455')
- password_tag.send_keys('qwer123')
- sleep(1)
- but = bro.find_element_by_id('login_button')
- but.click()
無頭瀏覽器和規避檢測
代碼
- from selenium import webdriver
- from time import sleep
- #實現無可視化界面
- from selenium.webdriver.chrome.options import Options
- #實現規避檢測
- from selenium.webdriver import ChromeOptions
- #實現無可視化界面
- chrome_options = Options()
- chrome_options.add_argument('--headless')
- chrome_options.add_argument('--disable-gpu')
- #實現規避檢測
- option = ChromeOptions()
- option.add_experimental_option('excludeSwitches',['enable-automation'])
- bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe",chrome_options=chrome_options,options=option)
- bro.get('https://www.baidu.com')
- print(bro.page_source)
- sleep(2)
- bro.quit()
【編輯推薦】
責任編輯:姜華
來源:
今日頭條