python爬取小说写入txt，利用Python技术，轻松实现小说网页内容抓取与本地保存

欧气 2024年10月22日 02:31 0 0

本文目录导读：

准备工作
编写Python脚本
运行脚本

在信息化时代，网络小说已成为人们休闲娱乐的重要组成部分，许多优秀的小说仅限于在线阅读，一旦网络不稳定或服务器关闭，读者将失去阅读的乐趣，本文将介绍如何利用Python技术，实现小说网页内容的抓取与本地保存，让读者随时随地享受阅读的乐趣。

准备工作

1、安装Python环境：确保电脑已安装Python，并配置好pip。

2、安装相关库：使用pip安装requests和BeautifulSoup库，用于网页请求和解析。

python爬取小说写入txt，利用Python技术，轻松实现小说网页内容抓取与本地保存

图片来源于网络，如有侵权联系删除

3、获取小说网页链接：在浏览器中找到目标小说网页，复制链接。

编写Python脚本

1、引入所需库

import requests
from bs4 import BeautifulSoup

2、定义函数获取网页内容

def get_html(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        return None

3、解析网页内容，提取小说章节链接

python爬取小说写入txt，利用Python技术，轻松实现小说网页内容抓取与本地保存

图片来源于网络，如有侵权联系删除

def get_chapter_links(url):
    html = get_html(url)
    soup = BeautifulSoup(html, 'html.parser')
    chapter_links = []
    for link in soup.find_all('a', class_='chapter_link'):
        chapter_links.append(link['href'])
    return chapter_links

4、定义函数获取章节内容

def get_chapter_content(url):
    html = get_html(url)
    soup = BeautifulSoup(html, 'html.parser')
    chapter_content = soup.find('div', class_='chapter_content').text
    return chapter_content

5、定义函数保存章节内容到本地

def save_chapter_content(content, chapter_num):
    with open('novel.txt', 'a', encoding='utf-8') as f:
        f.write(f'第{chapter_num}章
')
        f.write(content + '
')

6、主函数，实现抓取与保存

def main():
    url = '目标小说网页链接'  # 替换为目标小说网页链接
    chapter_links = get_chapter_links(url)
    for i, link in enumerate(chapter_links):
        chapter_content = get_chapter_content(link)
        save_chapter_content(chapter_content, i + 1)
if __name__ == '__main__':
    main()