当前位置: 首页 > 工具软件 > HTML Tidy > 使用案例 >

html tidy api,HTML Tidy

宋臻
2023-12-01

Crossposted to

[1]: https://lists.w3.org/Archives/Public/public-htacg/

[2]: https://sourceforge.net/p/tidy/mailman/tidy-develop

[3]: https://lists.w3.org/Archives/Public/html-tidy/

This message is addressed to HTML Tidy users, developers, maintainers, and

other

interested parties in an effort to spur discussion regarding the present and

future of HTML Tidy, including a proposal for the continued maintenance and

development of HTML Tidy.

Simply put, my proposal is that responsibility for the current SourceForge

repository be turned over to HTACG.

The preceding simple statement necessarily involves a large amount of

discussion. This is a big discussion with a lot of text, and some of it will

surely please each of you, and some it will certainly infuriate some of

you. I

hope that the "big picture" of what I'm presenting will encourage you to

support

the HTACG project and the opportunities it offers.

(I apologize for the Markdown like format, but it's very legible and

minimizes

the risk of reference mistakes.)

## What is HTACG

On 2015-January-15 I created the HTML Tidy Advocacy Community Group

([HTACG][4]), a [W3C Community Group][5], of which I am currently serving as

Chair. It "is dedicated to the continued support, development, and

evolution of

the HTML Tidy command line application and library."

More specifically, it "aims to become the canonical release group for HTML

Tidy,

which has been without a stable, public release since 2008. The Community

aspires to achieve the agreement and support of the original and current

developers to this end."

Certainly the above goals cannot be achieved without the cooperation of the

subscribers to this list.

(The above quotes are from our [official description][5]. Although the

current

SourceForge repository is regarded as stable by the developers, the

_intention_

of the statement is meant to indicate that there have been no _newer_

releases

or bug fixes).

Although HTACG is affiliated with the W3C, it's important to note that W3C

does

not provide direction over HTAGC. The community group belongs to the

community.

For additional information please see our [HTACG Project Charter][6].

## Meaning of "turned over to HTACG"

The simple proposal "responsibility for the current SourceForge repository

be

turned over to HTACG" means that the current maintainers grant access to the

repository to individuals as specified by HTACG. Certainly the current

maintainers are encouraged to affiliate with [HTACG][5] and take part in

this

decision process.

The result, publically, is HTML Tidy becoming a community driven, community

led

project. It's even possible that the current maintainers dominate HTACG, and

should this happen then at least:

- it's a community decision

- it happens under the auspices of a public-facing organization rather than

individuals.

Although the decision process for granting access has yet to be

[formally defined][6] it's a high priority for HTACG. In general HTACG

members

will reach consensus based on public discussion. This discussion should

consider

past and present contributions to HTACG and the HTML Tidy project. Strong

regard

should be given to the input of the current Chair or Chairs.

## HTACG Leadership and Succession

As mentioned above I am current Chair. This was done for the sake of

expediency

in kicking off HTACG. I do not imagine myself to be the "owner" of HTACG,

and

the position of Chair is always available to other HTACG members via the

[Community Group Page][5].

The community should expect and desire turnover in the position of Chair. As

such another work in progress is a formal [succession document][6], which

will

make provisions for turning over access to repository membership/ownership,

domain names, and other assets of HTACG.

A stable organization should be able to tolerate 100% turnover while

remaining

functional.

## Current State of Tidy

HTACG was formed specifically to fill the need of an interested steward for

HTML Tidy. There have been no bug fixes or improvements to the SourceForge

repository in several years and issues go unresolved. Popular operating

systems

ship with `tidy` that's not capable of working with HTML5, and popular

software

repositories ship with less than capable versions of `tidy`, too.

Additionally a prominent fork of HTML Tidy hosted by W3C featuring support

for

HTML5 had grown stagnant, too, with no commits or addressing of issues for

some

years.

In many corners of the Internet there are claims that "Tidy is dead," or

"Tidy

is outdated," or "Tidy isn't maintained." These are fair assessments and

HTACG

hopes to change both the facts and the perception.

HTACG has successfully [taken responsibility][7] for this aforementioned

prominent W3C fork. Due to a _perceived_ endorsement from [Dave Ragett][8]

HTACG

had understood that this fork was the approved, natural successor of the

SourceForge project, and has taken steps with this thought in mind.

Due to incomplete knowledge of some details of HTML Tidy's history we were

unaware of a fracture between the W3C fork and the current SourceForge

home. I

sincerely hope that our actions are seen as a sign of motivation and

enthusiasm

towards HTML Tidy rather than any attempt to usurp the current project.

Indeed

the future depends on current project.

## Why not fork?

Open source encourages forking, and there are successful forks of many

popular

pieces of software. MariaDB (né MySQL) is a good example of this. Both

MariaDB

and MySQL have large installed user bases and a large developer community.

Smaller projects, such as HTML Tidy, aren't as successful at this.

Although HTML Tidy is pervasive, the current developer community is small

and

due to lack of maintenance has fractured into scores of personal, private

forks.

A lot of these forkers have made improvements (most good, some bad) with

high

value for sharing, but without a leader — a known group or organization —

these

changes offer value to no one.

Tidy's past reputation is the best reason not to fork. HTACG intends to see

_Tidy_ thrive, not some offshoot that lacks its history. As distasteful as

the

word "branding" is to many of us, Tidy is a brand, and it's a brand that

shouldn't be tarnished by withering away and dying.

## HTACG Actions to Date

To date HTACG has achieved the following:

- Formed on 2015-January-15 ([initial announcement][10]).

- Assumed control of the W3C fork. (Yes, we now better understand some of

the

circumstances behind the origin of this fork, and are striving to undo

the

damage that resulted).

- Have setup a draft Project Charter.

- Have setup the framework for a self-running, community workgroup (WIP).

- Have reached out with our desire to work with the original maintainers

and to

ask them (you) to support and join our cause.

- Have closed all but one current pull request in our working branch.

- Have closed approximately 30 issues in our working branch.

- Have moved to a modern semantic versioning system.

- Have begun a new branding initiative.

- Have promoted the HTML5 capabilities added by Björn.

- Have put together an HTACG [filler website][4].

- Have made steps towards a proper [HTML tidy website][12].

## HTACG Tentative Plans

The several subsections below provide high-level details of what HTACG

proposes

to do. Our goal is to be community-driven, so some or many of these are

likely

to change based on what we collectively decide.

### Branding

"Branding" sounds like MBA nonsense in some people's ears, but branding and

positioning a project are important in order to attract new members to the

team

and attract the interest of new developers. Tidy's early reputation was

largely

gained through network effects, and while it's possible to leverage a

network

effect in the future, Tidy requires a relaunch, and a relaunch requires some

branding.

- Tidy itself is a brand. It has significant name recognition and is

regarded

as the defacto HTML cleaning tool by a significant userbase even today.

- W3C is a brand. HTACG's affiliation with W3C as a Community group lends

significant credibility to the project without any of the dangers in the

past. We are now completely aware of the on again, off again relationship

with W3C. As a Community Group there is no danger of that happening

again, as

the primary affiliation is HTACG. HTACG can exist without the W3C if the

community decides such.

- HTACG itself is capable of becoming a brand. "Who writes Tidy these

days?"

- Modernized websites and graphics. If we don't want to be perceived as an

artifict from 2002, we can't present the image of an artifact from 2002.

Certainly this is superficial, but the population at large is superficial

and we can't ignore image these days. It's no longer good enough to say,

"If what we provide is good, then people will come."

- Modernized communications channels. Similar to the above, there's a large

element of the population that expects to subscribe to a Twitter feed.

In short, a project that _looks_ alive will attract the attention and

support

that Tidy needs in order to _stay_ alive.

### Community Resources

#### Repositories

The current, true HTML Tidy is currently hosted at [SourceForge][9], while

the

branch inherited by HTACG from the W3C is working out of [GitHub][7].

While CVS and git both have their advantages and disadvantages, I propose

that

in the interest of community development, combined with responsible

maintainers,

we adopt Github as the official working repository.

If desired we should consider maintaining a mirror of the respository on

SourceForge. Although this subjects us to additional administrative burden,

HTML Tidy has a long history on SourceForge and for many users it is still

the

go-to destination for anything Tidy-related.

A mirror also affords an opportunity for the original maintainers to

separate

from HTACG if they should determine that they are not satisfied with the

progress that HTACG is promising.

#### Issues Trackers

With the assumption that we work from Github, we should close the issues

tracker

at SourceForge after migrating the issues to Github.

#### Websites

We should combine the existing websites. I have procured the domains

htacg.org

and html-tidy.org, and they can be pointed to any arbitrary host. (Please

note

that these domains will be surrendered to an appropriate, proper person in

line

with our work-in-progress [succession plan][6].)

In consideration for the "branding" issues already described, the cohesive,

single website will be in need of an upgrade.

My proposal includes using Github hosting for these websites. Just as for

software projects, this provides the ability for HTACG members and the

general

public to issue pull requests and post issues.

#### Mailing Lists

Github does not offer mailing list support. This still leaves us with three

main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and

[W3 Tidy][3]), which will be burdensome to monitor and support.

I will make the suggestion that we move to the set of HTACG mailing lists.

- As my suggestion is to move towards Github and adding distance from

SourceForge, it is natural not to favor SourceForge's mailing list.

- The orginal W3 mailing list has a long history, however in that some

members

have expressed disappointment in W3C's previous behaviors, perhaps it is

good to distance ourselves.

- The HTACG list is _also_ hosted at W3C, however we have more control

over it,

and it provides relevancy to HTACG as an organization.

Clearly we as members must be prepared to monitor all of the existing

mailing

lists during a transition period.

### Transparency and Working Documents

While debate about specific issues and implementations is suitable for issue

tracker threads, broader discussion towards strategy, leadership, working

documents, standards, etc. should be relegated to the appropriate public

mailing

list which provides HTACG members and non-members the ability to provide

feedback.

HTACG currently supports a set of working documents — many of which are

generously called "work in progress" — in our [community respository][6]. As

a github repository these very same working documents are subject to

community

comment and modification via pull requests.

It is HTACG's intention (abusing the oft-repeated ISO phrase) "to say what

we

do and do what we say."

Current (generously-called) works-in-progress include:

- Project Charter (the high level principles for HTACG)

- Contributor agreement (so we aren't burdened by proprietary licenses)

- Chair succession plan (so no one person can hold HTACG hostage)

- Guidelines for providing commit access (whom do we trust?)

- Guidelines for design criteria (code style, compiler specifications,

etc.)

- Guidelines for release criteria (when do we roll to "master"?)

- Guidelines and instructions for regression testing.

- Policy for accepting pull requests (for contributors and maintainers).

- Roadmap, including a description of Tidy's versioning (where do we go?)

### Relaunch Branch

A lot of development has been based on the branch derived from Björn

Höhrmann's

original patch for HTML5 and then taken by W3C. Although there may be some

design decisions that the current maintainers disagree with, the code is

much

more updated and several important contributions have been added based upon

Björn's work.

Therefore I suggest:

- We start with the current HTACG develop-500 branch.

- We run regression tests for all of the < HTML5 test cases. Successful

tests (or bug fixes) should satisfy everyone that HTACG Tidy is nominally

at the same level as SourceForge Tidy.

- All HTACG members are requested to review the code and test cases for the

new HTML5 functionality, and issues can be posted to the issue tracker if

they are technical in nature, or posted to the mailing list if they are

more

strategic or fundamental in nature.

### Revision Control History

Contributor history is an important aspect of FOSS software development, and

every effort to recognize contributors should be made.

Github offers an automatic version control history that records the

individual

who made a push, who accepted a pull request, and who originated a pull

request.

The current development branch at Github did not adequately record the

commit

history when it was first forked from SourceForge. However due to the

nature of

git, it seems that it might be possible to pull the SourceForge source while

maintaining its history, and then merge the current branch atop it while

maintaining the entire release history.

### Tidy History

The purpose of HTACG is, among other things, to keep HTML Tidy alive and

well,

and that includes honoring its past. HTACG will ensure that all previous

contributors, maintainers, and participants are prominently recognized on

its

websites using material sourced from SourceForge and Dave Ragett's W3C page.

## Summary

As you can see, in the 22 days since establishing HTACG, a lot of thought

and

effort have been put into promoting and maintaining HTML Tidy. While it's

true

that there is still a lot of work to be done, the framework for good

governance

and stewardship has been put into place.

I hope that subscribers to this list can recognize that Tidy needs help in

order

to remain relevant, and can grant support for this proposal or a modified

form

of this proposal.

Thank you for the significant amount of time you have invested in reading

this.

* * *

References:

[4]: http://www.htacg.org/

[5]: http://www.w3.org/community/htacg/

[6]: https://github.com/htacg/community/tree/master

[7]: https://github.com/htacg/tidy-html5

[8]: http://www.w3.org/People/Raggett/tidy/

[9]: http://tidy.sourceforge.net

[10]: https://github.com/htacg/tidy-html5/issues/137

[11]: http://www.html-tidy.org/

--

---

Jim Derry

Clinton Township, MI, USA

Nanjing, Jiangsu, China PRC

 类似资料:

相关阅读

相关文章

相关问答