开启辅助访问
 找回密码
 立即注册

Python:批量爬取弹幕(腾讯视频)并制作词云图

valenyl 回答数0 浏览数1409
一、爬取弹幕
从视频中找一条弹幕,搜索danmu(F12--全部--JS),排序找到最上面一条(打开所有链接发现danmu的区别)
# 1.插入模块
import csv
import requests
import pandas as pd   



# 2. 发送请求
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36'
}
for page in range(15, 15000, 30):   #发现弹幕不同,最上面的15开始(到15000 peroid-30就换一台)/ 链接删掉无用的call_back/timestamp=后面也需要修改page)
    url = f'https://mfm.video.qq.com/danmu?otype=json&target_id=7712619175%26vid%3Dk0042f69enx&session_key=0%2C420%2C1648389006&timestamp={page}'
    response = requests.get(url=url, headers=headers)
    json_data = response.json()
    print(response)



# 3.数据解析   预览里面看套娃
    for comment in json_data['comments']:
        commentid = comment['commentid']
        uservip_degree = comment['uservip_degree']
        content = comment['content']


        with open('腾讯视频弹幕.csv', encoding='utf-8-sig', mode='a', newline='') as f:
            csv_writer = csv.writer(f)
            csv_writer.writerow([commentid, uservip_degree, content])

二、制作词云图(用jupyter notebook操作)
下面这段代码成功过,大家可以试试~
import jieba
from pyecharts.charts import WordCloud
import pandas as pd
from pyecharts import options as opts

wordlist = []
data = pd.read_csv('腾讯视频弹幕.csv')['content']
data


data_list = data.values.tolist()
data_str = ''.join(data_list)
words = jieba.lcut(data_str)

for word in words:
    if len(word)>1:
        wordlist.append({'word':word,'count':1})
df = pd.DataFrame(wordlist)

dfword = df.groupby('word')['count'].sum()
dfword2 = dfword.sort_values(ascending=False)
dfword2


dfword3 = pd.DataFrame(dfword2.head(200),columns=['count'])

dfword3['word'] = dfword3.index
dfword3

word = dfword3['word'].tolist()
count = dfword3['count'].tolist()

a = [list(z) for z in zip(word, count)]
c = (
    WordCloud()
    .add('', a, word_size_range=[10, 50], shape='circle')
    .set_global_opts(title_opts=opts.TitleOpts(title="词云图"))
)
c.render_notebook()Python爬取腾讯视频弹幕:采集《雪中悍刀行》弹幕,并且做词云图可视化分析_哔哩哔哩_bilibili

三、制作多种多样的词云图
1.  一份待分析的文本数据,由于文本数据都是一段一段的,所以第一步要将这些句子或者段落划分成词,这个过程称之为分词,需要用到Python中的分词库jieba
2. 分词之后,就需要根据分词结果生成词云,这个过程需要用到wordcloud
3. 最后需要将生成的词云展现出来,用到大家比较熟悉的matplotlib
使用道具 举报
| 来自浙江 用Deepseek满血版问问看
当贝投影