Python3爬取B站視頻彈幕

作者：佚名 2018-01-04 09:20:55

本文通過8個步驟教你如何使用Python3爬取B站的視頻彈幕，快往下看看吧。

需要準備的環境:

一個B站賬號，需要先登錄,否則不能查看歷史彈幕記錄
聯網的電腦和順手的瀏覽器，我用的Chrome
Python3環境以及request模塊，安裝使用命令，換源比較快：

pip3 install  request -i http://pypi.douban.com/simple

爬取步驟:

1.登錄后打開需要爬取的視頻頁面，打開開發者工具臺，Chrome可以使用F12快捷鍵，選擇network監聽請求

2.點擊查看歷史彈幕，獲取請求

其中rolldate后面的數字表示該視頻對應的彈幕號，返回的數據中timestamp表示彈幕日期，new表示數目

4.在查看歷史彈幕中任選一天，查看，會發出新的請求

dmroll ，時間戳，彈幕號，表示獲取該日期的彈幕，1507564800 表示2017/10/10 0:0:0

該請求返回xml數據

5.使用正則表達式獲取所有彈幕消息，匹配模式

'<d p=".*?">(.*?)</d>'

6.拼接字符串，將所有彈幕保存到本地文件即可

with open('content.txt', mode='w+', encoding='utf8') as f:    f.write(content)

7.參考代碼如下，將彈幕按照日期保存為單個文件...因為太多了...

import requests 
 
import re 
 
import time  
 
"""    爬取嗶哩嗶哩視頻彈幕信息"""  
 
# 2043618 是視頻的彈幕標號,這個地址會返回時間列表 
 
# https://www.bilibili.com/video/av1349282 
 
url = 'https://comment.bilibili.com/rolldate,2043618' 
 
# 獲取彈幕的id 2043618 
 
video_id = url.split(',')[-1]print(video_id) 
 
# 獲取json文件 
 
html = requests.get(url) 
 
# print(html.json()) 
 
  
 
# 生成時間戳列表 
 
time_list = [i['timestamp'] for i in html.json()] 
 
# print(time_list) 
 
  
 
# 獲取彈幕網址格式 'https://comment.bilibili.com/dmroll,時間戳,彈幕號' 
 
  
 
# 彈幕內容,由于總彈幕量太大,將每個彈幕文件分別保存 
 
for i in time_list:    content = ''    j = 'https://comment.bilibili.com/dmroll,{0},{1}'.format(i, video_id)    print(j)    text = requests.get(j).text 
 
    # 匹配彈幕內容    res = re.findall('<d p=".*?">(.*?)</d>', text)     
 
    # 將時間戳轉化為日期形式,需要把字符串轉為整數    timeArray = time.localtime(int(i))    date_time = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)    print(date_time)    content += date_time + ' 
 
'    for k in res:        content += k + ' 
 
'    content += ' 
 
'    file_path = 'txt/{}.txt'.format(time.strftime("%Y_%m_%d", timeArray))    print(file_path)     
 
    with open(file_path, mode='w+', encoding='utf8') as f:        f.write(content)

8.最終效果

責任編輯：龐桂玉來源：程序員共讀

python 爬蟲視頻彈幕

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

Python3爬取B站視頻彈幕