首页 技术 正文
技术 2022年11月19日
0 收藏 532 点赞 4,068 浏览 14395 个字

模块安装

pip3 install beautifulsoup4

模块导入

from bs4 import BeautifulSoup

示例html内容

RPC是一种比较流行的RPC通信框架,由谷歌公司开源,它提供了对Java、C++以及Python等常用语言的支持。本文主要梳理在Python环境下如何使用gRPC进行通信。

获取的html内容

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript" src="/js/m.js"></script>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>经典小说排行榜-新笔趣阁</title>
<meta name="keywords" content="新笔趣阁,小说排行榜" />
<meta name="description" content="新笔趣阁是广大书友最值得收藏的小说排行榜阅读网,网站收录了当前最火热的小说排行榜,免费提供高质量的小说最新章节,是广大网络小说爱好者必备的小说阅读网。" />
<meta http-equiv="Cache-Control" content="no-siteapp" />
<meta http-equiv="Cache-Control" content="no-transform" />
<link rel="stylesheet" href="/css/xbqg.css" rel="external nofollow" />
<script type="text/javascript" src="http://libs.baidu.com/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript" src="/js/xbqg.js"></script>
</head>
<body>
<div id="wrapper">
<script>login();</script>
<div class="header">
<div class="header_logo">
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >新笔趣阁</a>
</div>
<script>panel();</script>
</div>
<div class="clear"></div>
<div class="nav">
<ul>
<li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>
<li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>
<li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>
<li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>
<li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>
<li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>
<li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>
<li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>
<li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>
<li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>
<li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>
<li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>
</ul>
</div>
<div id="main">
<div class="novelslist2">
<h2>小说排行榜列表</h2>
<ul>
<li><span class="s1"><b>作品分类</b></span><span class="s2"><b>作品名称</b></span><span class="s3"><b>最新章节</b></span><span class="s4"><b>作者</b></span><span class="s5"><b>更新时间</b></span><span class="s6"><b>状态</b></span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/78_78760/" rel="external nofollow" target="_blank">我本港岛电影人</a></span><span class="s3"><a href="/78_78760/1203299.html" rel="external nofollow" target="_blank">今天有更</a></span><span class="s4">再来一盘菇凉</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a>]</span><span class="s2"><a href="/90_90002/" rel="external nofollow" target="_blank">艾泽拉斯新秩序</a></span><span class="s3"><a href="/90_90002/350275.html" rel="external nofollow" target="_blank">第一百三十六章 卡拉赞的收获</a></span><span class="s4">想静静的顿河</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/90_90842/" rel="external nofollow" target="_blank">超级狂婿</a></span><span class="s3"><a href="/90_90842/350271.html" rel="external nofollow" target="_blank">第654章:他不够格</a></span><span class="s4">我本幸运</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/90_90305/" rel="external nofollow" target="_blank">我在都市修个仙</a></span><span class="s3"><a href="/90_90305/339101.html" rel="external nofollow" target="_blank">完本感言</a></span><span class="s4">一剑荡清风</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/75_75283/" rel="external nofollow" target="_blank">都市超级医圣</a></span><span class="s3"><a href="/75_75283/4165727.html" rel="external nofollow" target="_blank">第2613章 战后处理</a></span><span class="s4">断桥残雪</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/90_90235/" rel="external nofollow" target="_blank">祖传土豪系统</a></span><span class="s3"><a href="/90_90235/350262.html" rel="external nofollow" target="_blank">第二百零五章 我能试试吗</a></span><span class="s4">第九倾城</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/83_83534/" rel="external nofollow" target="_blank">都市红粉图鉴</a></span><span class="s3"><a href="/83_83534/838632.html" rel="external nofollow" target="_blank">第1510章 我,才是坐馆龙头!</a></span><span class="s4">秋江独钓</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a>]</span><span class="s2"><a href="/89_89635/" rel="external nofollow" target="_blank">胜天传奇</a></span><span class="s3"><a href="/89_89635/998995.html" rel="external nofollow" target="_blank">第三百八十章 游历天宫</a></span><span class="s4">骑牛者</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/88_88085/" rel="external nofollow" target="_blank">总裁爸比从天降</a></span><span class="s3"><a href="/88_88085/998993.html" rel="external nofollow" target="_blank">第1748章 奈何自己是婆婆</a></span><span class="s4">一碟茴香豆</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> <li><span class="s1">[<a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a>]</span><span class="s2"><a href="/89_89996/" rel="external nofollow" target="_blank">太古魔帝</a></span><span class="s3"><a href="/89_89996/998988.html" rel="external nofollow" target="_blank">第一千三百二十四章 魂帝</a></span><span class="s4">草根</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li> </ul>
</div>
<div class="clear"></div>
</div>
</div>
<div class="footer">
<div class="footer_cont"><script>footer();dl();</script></div>
</div>
</body>
</html>

构建BeautifulSoup对象

常用四种解释器

Python爬虫入门教程之BeautifulSoup

常用命令:

安装
pip install SomePackage # 最新版本
pip install SomePackage==1.0.4 # 指定版本
pip install 'SomePackage>=1.0.4' # 最小版本卸载安装包命令
pip uninstall <包名> 或 pip uninstall -r requirements.txt升级
pip install -U <包名> 或:pip install <包名> --upgrade查看
pip freeze #查看已经安装的包及版本信息列出可以升级的包
pip list -o搜索包
pip search <pkgName>查看包详情
pip show <pkgName>

find_all根据条件获取节点

find_all( name , attrs , recursive , text , **kwargs )
name :查找所有名字为 name 的tag,字符串对象会被自动忽略掉;
attrs:根据属性查询,使用字典类型;
text :可以搜搜文档中的字符串内容.与 name 参数的可选值一样, text 参数接受 字符串 , 正则表达式 , 列表, True ;
recursive:调用tag的 find_all() 方法时,Beautiful Soup会检索当前tag的所有子孙节点,如果只想搜索tag的直接子节点,可以使用参数 recursive=False ;
limit:find_all() 方法返回全部的搜索结构,如果文档树很大那么搜索会很慢.如果我们不需要全部结果,可以使用 limit 参数限制返回结果的数量.效果与SQL中的limit关键字类似,当搜索到的结果数量达到 limit 的限制时,就停止搜索返回结果;
class_ :通过 class_ 参数搜索有指定CSS类名的tag,class_ 参数同样接受不同类型的 过滤器 ,字符串,正则表达式,方法或 True。

根据标签名字

lis = soup.find_all(name="li")
for item in lis:
print(item)结果是
<li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>
<li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>
<li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>
<li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>
<li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>
<li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>
<li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>
<li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>
<li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>
<li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>
<li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>
<li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>
<li><span class="s1"><b>作品分类</b></span><span class="s2"><b>作品名称</b></span><span class="s3"><b>最新章节</b></span><span class="s4"><b>作者</b></span><span class="s5"><b>更新时间</b></span><span class="s6"><b>状态</b></span></li>
<li><span class="s1">[<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>]</span><span class="s2"><a href="/90_90590/" rel="external nofollow" rel="external nofollow" target="_blank">我能举报万物</a></span><span class="s3"><a href="/90_90590/361969.html" rel="external nofollow" target="_blank">第九十六章 巡抚视察【第三更】</a></span><span class="s4">必火</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a>]</span><span class="s2"><a href="/81_81279/" rel="external nofollow" rel="external nofollow" target="_blank">女战神的黑包群</a></span><span class="s3"><a href="/81_81279/1140238.html" rel="external nofollow" target="_blank">第3046章 恶毒女配,在线提刀45</a></span><span class="s4">二谦</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a>]</span><span class="s2"><a href="/89_89699/" rel="external nofollow" rel="external nofollow" target="_blank">花岗岩之怒</a></span><span class="s3"><a href="/89_89699/999707.html" rel="external nofollow" target="_blank">第一百五十二章 意外到来的断剑</a></span><span class="s4">咱的小刀</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a>]</span><span class="s2"><a href="/77_77363/" rel="external nofollow" rel="external nofollow" target="_blank">超神机械师</a></span><span class="s3"><a href="/77_77363/1338182.html" rel="external nofollow" target="_blank">1090 韭菜的自觉</a></span><span class="s4">齐佩甲</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a>]</span><span class="s2"><a href="/59_59644/" rel="external nofollow" rel="external nofollow" target="_blank">无量真途</a></span><span class="s3"><a href="/59_59644/3199234.html" rel="external nofollow" target="_blank">第六百三十二章 突然出现的神智</a></span><span class="s4">燕十千</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a>]</span><span class="s2"><a href="/88_88061/" rel="external nofollow" rel="external nofollow" target="_blank">我的细胞监狱</a></span><span class="s3"><a href="/88_88061/999706.html" rel="external nofollow" target="_blank">第四百五十九章 白雾</a></span><span class="s4">穿黄衣的阿肥</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a>]</span><span class="s2"><a href="/88_88375/" rel="external nofollow" rel="external nofollow" target="_blank">前任无双</a></span><span class="s3"><a href="/88_88375/999705.html" rel="external nofollow" target="_blank">第三百章 事急速办</a></span><span class="s4">跃千愁</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a>]</span><span class="s2"><a href="/90_90719/" rel="external nofollow" rel="external nofollow" target="_blank">元阳道君</a></span><span class="s3"><a href="/90_90719/361968.html" rel="external nofollow" target="_blank">第四十章 洞开</a></span><span class="s4">剑扼虚空</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a>]</span><span class="s2"><a href="/88_88151/" rel="external nofollow" rel="external nofollow" target="_blank">逆成长巨星</a></span><span class="s3"><a href="/88_88151/999704.html" rel="external nofollow" target="_blank">655:不是办法的办法</a></span><span class="s4">葛洛夫街兄弟</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>
<li><span class="s1">[<a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a>]</span><span class="s2"><a href="/89_89303/" rel="external nofollow" rel="external nofollow" target="_blank">承包大明</a></span><span class="s3"><a href="/89_89303/999703.html" rel="external nofollow" target="_blank">第一百九十三章 真会玩</a></span><span class="s4">南希北庆</span><span class="s5">2019-11-16</span><span class="s6">连载中</span></li>

在开发Python应用程序的时候,系统安装的Python3只有一个版本:3.4。所有第三方的包都会被pip安装到Python3的site-packages目录下。

如果我们要同时开发多个应用程序,那这些应用程序都会共用一个Python,就是安装在系统的Python 3。如果应用A需要jinja 2.7,而应用B需要jinja 2.6怎么办?

这种情况下,每个应用可能需要各自拥有一套“独立”的Python运行环境。virtualenv就是用来为一个应用创建一套“隔离”的Python运行环境。

获取html内容代码

import requestsheaders = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 115Browser/9.0.0"
}
response = requests.get("https://www.xbiquge6.com/xbqgph.html",headers=headers)
response.encoding = "utf-8"
html = response.text
print(html)

根据标签属性

属性和值以字典形式传入

lis = soup.find_all(attrs={"class":"s2"})
for item in lis:
print(item)结果是
<span class="s2"><b>作品名称</b></span>
<span class="s2"><a href="/90_90590/" rel="external nofollow" rel="external nofollow" target="_blank">我能举报万物</a></span>
<span class="s2"><a href="/81_81279/" rel="external nofollow" rel="external nofollow" target="_blank">女战神的黑包群</a></span>
<span class="s2"><a href="/89_89699/" rel="external nofollow" rel="external nofollow" target="_blank">花岗岩之怒</a></span>
<span class="s2"><a href="/77_77363/" rel="external nofollow" rel="external nofollow" target="_blank">超神机械师</a></span>
<span class="s2"><a href="/59_59644/" rel="external nofollow" rel="external nofollow" target="_blank">无量真途</a></span>
<span class="s2"><a href="/88_88061/" rel="external nofollow" rel="external nofollow" target="_blank">我的细胞监狱</a></span>
<span class="s2"><a href="/88_88375/" rel="external nofollow" rel="external nofollow" target="_blank">前任无双</a></span>
<span class="s2"><a href="/90_90719/" rel="external nofollow" rel="external nofollow" target="_blank">元阳道君</a></span>
<span class="s2"><a href="/88_88151/" rel="external nofollow" rel="external nofollow" target="_blank">逆成长巨星</a></span>
<span class="s2"><a href="/89_89303/" rel="external nofollow" rel="external nofollow" target="_blank">承包大明</a></span>

限制搜索范围

find_all 方法会搜索当前标签的所有子孙节点,如果只想搜索直接子节点,可以使用参数 recursive=False

遍历获取子节点

.contents获取所有子节点

以列表形式返回所有子节点,要注意,列表里面还会掺杂 ‘n’

ul = soup.ul
print(ul)
print(ul.contents)结果是
<ul>
<li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>
<li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>
<li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>
<li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>
<li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>
<li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>
<li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>
<li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>
<li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>
<li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>
<li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>
<li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>
</ul>
['\n', <li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>, '\n', <li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>, '\n', <li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>, '\n', <li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>, '\n', <li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>, '\n', <li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>, '\n', <li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>, '\n', <li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>, '\n', <li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>, '\n', <li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>, '\n', <li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>, '\n', <li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>, '\n']

.children获取所有子节点

返回一个list生成器对象

ul = soup.ul
print(ul.children)
print(list(ul.children))结果是
['\n', <li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>, '\n', <li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>, '\n', <li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>, '\n', <li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>, '\n', <li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>, '\n', <li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>, '\n', <li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>, '\n', <li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>, '\n', <li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>, '\n', <li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>, '\n', <li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>, '\n', <li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>, '\n']

.descendants遍历所有子孙节点

ul = soup.ul
for item in ul.descendants:
print(item)结果是(中间很多'\n'空行我删掉了)
<li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a>
首页
<li><a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a></li>
<a href="/evercase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >永久书架</a>
永久书架
<li><a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a></li>
<a href="/xclass/1/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >玄幻奇幻</a>
玄幻奇幻
<li><a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a></li>
<a href="/xclass/2/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >武侠仙侠</a>
武侠仙侠
<li><a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a></li>
<a href="/xclass/3/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >都市言情</a>
都市言情
<li><a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a></li>
<a href="/xclass/4/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >历史军事</a>
历史军事
<li><a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a></li>
<a href="/xclass/5/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >科幻灵异</a>
科幻灵异
<li><a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a></li>
<a href="/xclass/6/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >网游竞技</a>
网游竞技
<li><a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a></li>
<a href="/xclass/7/1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >女频频道</a>
女频频道
<li><a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a></li>
<a href="/quanben/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >完本小说</a>
完本小说
<li><a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a></li>
<a href="/xbqgph.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >排行榜单</a>
排行榜单
<li><a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a></li>
<a href="/xbqgcase.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >临时书架</a>
临时书架

获取其父节点

a = soup.li.a
print(a)
p = a.parent
print(p)结果是
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a>
<li><a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a></li>

提取节点信息

节点名称

感觉没什么用

title = soup.title
print(title.name)结果是
title

节点属性

a = soup.li.a
print(a)
print(a.attrs) # 获取所有属性,返回字典形式
print(a['href'])# 获取a节点的href属性值结果是
<a href="/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >首页</a>
{'href': '/'}
/

节点文本

a = soup.li.a
print(type(a.string)) # 节点内文本的类型
print(a.string) # 获取节点内的文本内容
print(a.get_text()) # 也是获取节点内的文本内容
结果是
<class 'bs4.element.NavigableString'>
首页

注意!!!如果节点内文本是注释,则用string取出文本时会自动去除注释标记
注释的类型:<class ‘bs4.element.Comment’>,可以通过类型判断

遍历获取所有子孙节点中的文本

for string in soup.stripped_strings:  # 去除多余空白内容
print(repr(string))
相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,085
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,560
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,409
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,182
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,819
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:4,902