java txtmark_Txtmark: Txtmark 是 Java 实现的 Markdown 解析器，用来生成 HTML 文档

危砚

2023-12-01

Txtmark - Java markdown processor

See LICENSE.txt for licensing information.

Txtmark is yet another markdown processor for the JVM.

It is easy to use:

String result = txtmark.Processor.process("This is ***TXTMARK***");

It is fast (see below)

... well, it is the fastest markdown processor on the JVM right now.

(This might be outdated, but txtmark is still flippin' fast.)

It does not depend on other libraries, so classpathing txtmark.jar is

sufficient to use Txtmark in your project.

For an in-depth explanation of markdown have a look at the original Markdown Syntax.

Maven repository

Txtmark is available on maven central.

Txtmark extensions

To enable Txtmark's extended markdown parsing you can use the $PROFILE$ mechanism:

[$PROFILE$]: extended

This seemed to me as the easiest and safest way to enable different behaviours.

Just put this line into your Txtmark file like you would use reference links.

Behavior changes when using [$PROFILE$]: extended

Lists and code blocks end a paragraph

In normal markdown the following:

This is a paragraph

* and this is not a list

Will produce:

This is a paragraph

* and this is not a list

When using Txtmark extensions this changes to:

This is a paragraph

and this is not a list

Text anchors

Headlines and list items may recieve an ID which

you can refer to using links.

## Headline with ID ## {#headid}

Another headline with ID {#headid2}

------------------------

* List with ID {#listid}

Links: [Foo] (#headid)

this will produce:

Headline with ID

Another headline with ID

List with ID

Links: Foo

The ID must be the last thing on the first line.

All spaces before {# get removed, so you can't

use an ID and a manual line break in the same line.

Auto HTML entities

(R) becomes ® - ®

(TM) becomes ™ - ™

-- becomes – - –

--- becomes — - —

... becomes … - …

<< becomes « - «

>> becomes » - »

"Hello" becomes “Hello” - “Hello”

Underscores (Emphasis)

Underscores in the middle of a word don't result in emphasis.

Con_cat_this

normally produces this:

Concatthis

Superscript

You can use ^ to mark a span as superscript.

2^2^ = 4

turns into

2² = 4

Abbreviations

Abbreviations are defined like reference links, but using a *

instead of a link and must be single-line only.

[Git]: * "Fast distributed revision control system"

and used like this

This is [Git]!

which will produce

This is Git!

Fenced code blocks

```

This is code!

```

~~~

Another code block

~~~

You can also mix flavours

```

Fenced code block delimiter lines do start with at least three of `` or `~

It is possible to add meta data to the beginning line. Everything trailing after `` or `~ is then considered meta data. These are all valid meta lines:

```python

~ ~ ~ ~ ~java

``` ``` ``` this is even more meta

The meta information that you provide here can be used with a BlockEmitter to include e.g. syntax highlighted code blocks. Here's an example:

public class CodeBlockEmitter implements BlockEmitter

{

private static void append(StringBuilder out, List lines)

{

out.append("

");

for (final String l : lines)

{

Utils.escapedAdd(out, l);

out.append('\n');

}

out.append("

");

}

@Override

public void emitBlock(StringBuilder out, List lines, String meta)

{

if (Strings.isEmpty(meta))

{

append(out, lines);

}

else

{

try

{

// Utils#highlight(...) is not included with txtmark, it's sole purpose

// is to show what the meta can be used for

out.append(Utils.highlight(lines, meta));

out.append('\n');

}

catch (final IOException e)

{

// Ignore or do something, still, pump out the lines

append(out, lines);

}

You can then set the BlockEmitter in the txtmark Configuration using Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter).

Markdown conformity

Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09

except of two:

Images.text

Fails because Txtmark doesn't produce empty 'title' image attributes.

(IMHO: Images ... OK)

Literal quotes in titles.text

What the frell ... this test will continue to FAIL.

Sorry, but using unescaped " in a title which should be surrounded

by " is unacceptable for me ;)

Change:

Foo [bar](/url/ "Title with "quotes" inside").

[bar]: /url/ "Title with "quotes" inside"

to:

Foo [bar](/url/ "Title with \"quotes\" inside").

[bar]: /url/ "Title with \"quotes\" inside"

and Txtmark will produce the correct result.

(IMHO: Literal quotes in titles ... OK)

Where Txtmark is not like Markdown

Txtmark does not produce empty title attributes in link and image tags.

Unescaped " in link titles starting with " are not recognized and result

in unexpected behaviour.

Due to a different list parsing approach some things get interpreted differently:

* List

> Quote

will produce when processed with Markdown:

List

Quote

and this when produced with Txtmark:

List

Quote

Another one:

* List

====

will produce when processed with Markdown:

* List

and this when produced with Txtmark:

List

List of escapeable characters:

\ [ ] ( ) { } #

" ' . < > + - _

! ` ^

Performance comparison of markdown processors for the JVM

Remarks: These benchmarks are too old to be of any value. I leave them here as a reference, though.

Excerpt from the original post concerning this benchmark suite:

Most of these tests are of course unrealistic: Who would write a

text where each word is a link? Yet they serve an important use:

It makes it possible for the developer to pinpoint the parts of

the parser where there is most room for improvement. Also, it

explains why certain texts might render much faster in one

Processor than in another.

Benchmark system:

Ubuntu Linux 10.04 32 Bit

Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz

Java(TM) SE Runtime Environment (build 1.6.0_24-b07)

Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)

Test

Actuarius

PegDown

Knockoff

Txtmark

1st Run (ms)

2nd Run (ms)

1st Run (ms)

2nd Run (ms)

1st Run (ms)

2nd Run (ms)

1st Run (ms)

2nd Run (ms)

Plain Paragraphs

1127

577

1273

1037

740

400

157

Every Word Emphasized

1562

1001

1523

1513

13982

13221

Every Word Strong

1125

997

1115

1114

9543

9647

Every Word Inline Code

382

277

1058

1052

9116

9074

Every Word a Fast Link

2257

1600

537

531

3980

3410

109

Every Word Consisting of Special XML Chars

4045

4270

2985

3044

312

377

778

775

Every Word wrapped in manual HTML tags

3334

2919

901

896

3863

3736

Every Line with a manual line break

510

588

1445

1440

1527

1130

Every word with a full link

452

246

1045

996

1884

1819

Every word with a full image

268

150

1140

1132

1985

1908

Every word with a reference link

9847

9082

18956

18719

121136

115416

1525

1380

Every block a quote

445

206

1312

1301

478

457

Every block a codeblock

373

376

161

175

Every block a list

920

912

1720

1725

622

651

All tests together

3281

2885

5184

5196

10130

10460

206

196

Benchmarked versions:

Actuarius version: 0.2

PegDown version: 0.8.5.4

Knockoff version: 0.7.3-15

Mentioned/related projects

java txtmark_Txtmark: Txtmark 是 Java 实现的 Markdown 解析器，用来生成 HTML 文档

Headline with ID

Another headline with ID

* List

List

相关阅读

相关文章

相关问答

相关文档