需要帮助与 lxml.html 和 xpath-小白教程

添方夜弹 发表于 2021-2-23 09:42:38

需要帮助与 lxml.html 和 xpath

data = response.text
tree = html.fromstring(data)
Services_Product = tree.xpath("//dt/following-sibling::dd")这需要更多的工作。这个领域是
file = open('html_01.txt', 'r')
data = file.read()
tree = html.fromstring(data)
Services_Product = tree.xpath("//dt/following-sibling::dd")
stuff = Services_Product.xpath("//li")
for elem in stuff:
print(elem.text)

远方的树 发表于 2021-2-25 10:36:41

可以用作BS的解析器，我总是这样做。
soup = BeautifulSoup(response.content, 'lxml')
from lxml import html
import requests

resonse = requests.get(url)
tree = html.fromstring(resonse.content)
prod = tree.xpath('//*[@id="business-info"]/dl/dd/ul')
for tag in prod.getchildren():
print(tag.text)

蓝精灵童鞋 发表于 2021-3-16 15:36:46

当使用不同的标签并希望从该文本时，在已找到的部分上使用html2文本更容易
import html2text

data = '''\
<dd>
A block of text here.... bla bla bla....
<ul>
<li><p>Item 1.for some reason they wraped this in a p</p></li>
<li><strong>And this item is important</strong>bla bla bla</li>
<li>And just more info here...</li>
</ul>
And finally more stuff here...
</dd>'''

text = html2text.HTML2Text()
text.mark_code = True
text.ignore_emphasis = True
text.single_line_break = True
text.ignore_links = True
text = text.handle(data)
print(text.strip())

页: [1]

小白教程's Archiver

需要帮助与 lxml.html 和 xpath