php markdown 转html,python转换html为markdown

罗兴运
2023-12-01

目录

导读:虽然我大部分使用php生成markdown,但python库确实也比较丰富的不要不要,php composer也是参考学习python包管理,才会让php也有一种搭积木的感觉。

使用python将markdown转换成html的情况比较多,今天我们将另一个库将html转换为markdown。

html2text

安装

1.使用pip

pip install html2text #python3使用pip3

2.源码安装

如果使用的是python3将下面的python后面加一个3

git clone --depth 1 https://github.com/Alir3z4/html2text.git

python setup.py build

python setup.py install

使用

import html2text

html = "

hello https://xxx.com

"

md = html2text.html2text(html)

print(md)

运行结果

**hello** https://xxx.com

高级用法

忽略链接即a标签

import html2text

text_maker = html2text.HTML2Text()

text_maker.ignore_links = True

text_maker.bypass_tables = False

html = html

text = text_maker.handle(html)

print(text)

运行结果

**hello** https://xxx.com

链接

如果将ignore_links = False 运行结果

**hello** https://xxx.com

[链接](https://xxx.com)

我们可以看到开启之后只提取文本,而关闭后变成了markdown的链接语法

其他可选项

UNICODE_SNOB for using unicode

ESCAPE_SNOB for escaping every special character

LINKS_EACH_PARAGRAPH for putting links after every paragraph

BODY_WIDTH for wrapping long lines

SKIP_INTERNAL_LINKS to skip #local-anchor things

INLINE_LINKS for formatting images and links

PROTECT_LINKS protect from line breaks

GOOGLE_LIST_INDENT no of pixels to indent nested lists

IGNORE_ANCHORS

IGNORE_IMAGES

IMAGES_AS_HTML always generate HTML tags for images; preserves height, width, alt if possible.

IMAGES_TO_ALT

IMAGES_WITH_SIZE

IGNORE_EMPHASIS

BYPASS_TABLES format tables in HTML rather than Markdown

IGNORE_TABLES ignore table-related tags (table, th, td, tr) while keeping rows

SINGLE_LINE_BREAK to use a single line break rather than two

UNIFIABLE is a dictionary which maps unicode abbreviations to ASCII values

RE_SPACE for finding space-only lines

RE_ORDERED_LIST_MATCHER for matching ordered lists in MD

RE_UNORDERED_LIST_MATCHER for matching unordered list matcher in MD

RE_MD_CHARS_MATCHER for matching Md \,[,],( and )

RE_MD_CHARS_MATCHERALL for matching `,*, ,{,},[,],(,),#,!

RE_MD_DOT_MATCHER for matching lines starting with 1.

RE_MD_PLUS_MATCHER for matching lines starting with +

RE_MD_DASH_MATCHER for matching lines starting with (-)

RE_SLASH_CHARS a string of slash escapeable characters

RE_MD_BACKSLASH_MATCHER to match \char

USE_AUTOMATIC_LINKS to convert http://xyz to http://xyz

MARK_CODE to wrap ‘pre’ blocks with [code]…[/code] tags

WRAP_LINKS to decide if links have to be wrapped during text wrapping (implies INLINE_LINKS = False)

WRAP_LIST_ITEMS to decide if list items have to be wrapped during text wrapping

DECODE_ERRORS to handle decoding errors. ‘strict’, ‘ignore’, ‘replace’ are the acceptable values.

DEFAULT_IMAGE_ALT takes a string as value and is used whenever an image tag is missing an alt value. The default for this is an empty string '’ to avoid backward breakage

OPEN_QUOTE is the character used to open a quote when replacing the tag. It defaults to ".

CLOSE_QUOTE is the character used to close a quote when replacing the tag. It defaults to ".

本文收藏来自互联网,用于学习研究,著作权归原作者所有,如有侵权请联系删除

markdown @tsingchan

部分引用格式为收藏注解,比如本句就是注解,非作者原文。

 类似资料: