但是,结果是写在第3行和第5行都以标点符号开始的地方,如PDF输出的图像所示
我可以简单地在适当的地方添加一些新的行,以使它看起来正确,但这将意味着,如果文本在内部重新翻译,我的修复可能不再起作用。有人知道如何确保iText不会以这些标点符号开始一行吗?
对于亚洲语言中的断行,您需要编写自己的Splitcharacter实现。一个很好的断线参考是Unicode®标准附件#14-Unicode断线算法。另一个是https://msdn.microsoft.com/en-us/library/cc194864.aspx。
在为日语实现这个过程中遭受了痛苦,我把我为日语文本和英语文本混合编写的示例代码放在一起。此代码可以很容易地修改为中文使用上面的引用。
下面是一个显示JapaneseSplitCharacter正在使用的片段:
Chunk chunk = new Chunk(<asian text>,<asian font>);
chunk.setSplitCharacter(JapaneseSplitCharacter.SplitCharacter);
Paragraph paragraph = new Paragraph(chunk);
import com.itextpdf.text.SplitCharacter;
import com.itextpdf.text.pdf.DefaultSplitCharacter;
import com.itextpdf.text.pdf.PdfChunk;
/**
* <p/>
* For basic latin characters spaces, periods, commas, etc. are split characters. For Japanese characters lines can break
* anywhere, unless prohibited. This class uses logic for Japanese, non-starting and non-ending characters based on the
* kinsoku rule and uses the DefaultSplitCharacter class for basic latin characters while writing free flowing text to a PDF.
* <p/>
*/
public class JapaneseSplitCharacter implements SplitCharacter {
// line of text cannot start or end with this character
static final char u2060 = '\u2060'; // - ZERO WIDTH NO BREAK SPACE
// a line of text cannot start with any following characters in NOT_BEGIN_CHARACTERS[]
static final char u30fb = '\u30fb'; // ・ - KATAKANA MIDDLE DOT
static final char u2022 = '\u2022'; // • - BLACK SMALL CIRCLE (BULLET)
static final char uff65 = '\uff65'; // ・ - HALFWIDTH KATAKANA MIDDLE DOT
static final char u300d = '\u300d'; // 」 - RIGHT CORNER BRACKET
static final char uff09 = '\uff09'; // ) - FULLWIDTH RIGHT PARENTHESIS
static final char u0021 = '\u0021'; // ! - EXCLAMATION MARK
static final char u0025 = '\u0025'; // % - PERCENT SIGN
static final char u0029 = '\u0029'; // ) - RIGHT PARENTHESIS
static final char u002c = '\u002c'; // , - COMMA
static final char u002e = '\u002e'; // . - FULL STOP
static final char u003f = '\u003f'; // ? - QUESTION MARK
static final char u005d = '\u005d'; // ] - RIGHT SQUARE BRACKET
static final char u007d = '\u007d'; // } - RIGHT CURLY BRACKET
static final char uff61 = '\uff61'; // 。 - HALFWIDTH IDEOGRAPHIC FULL STOP
static final char uff63 = '\uff63'; // 」 - HALFWIDTH RIGHT CORNER BRACKET
static final char uff64 = '\uff64'; // 、 - HALFWIDTH IDEOGRAPHIC COMMA
static final char uff67 = '\uff67'; // ァ - HALFWIDTH KATAKANA LETTER SMALL A
static final char uff68 = '\uff68'; // ィ - HALFWIDTH KATAKANA LETTER SMALL I
static final char uff69 = '\uff69'; // ゥ - HALFWIDTH KATAKANA LETTER SMALL U
static final char uff6a = '\uff6a'; // ェ - HALFWIDTH KATAKANA LETTER SMALL E
static final char uff6b = '\uff6b'; // ォ - HALFWIDTH KATAKANA LETTER SMALL O
static final char uff6c = '\uff6c'; // ャ - HALFWIDTH KATAKANA LETTER SMALL YA
static final char uff6d = '\uff6d'; // ュ - HALFWIDTH KATAKANA LETTER SMALL YU
static final char uff6e = '\uff6e'; // ョ - HALFWIDTH KATAKANA LETTER SMALL YO
static final char uff6f = '\uff6f'; // ッ - HALFWIDTH KATAKANA LETTER SMALL TU
static final char uff70 = '\uff70'; // ー - HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
static final char uff9e = '\uff9e'; // ゙ - HALFWIDTH KATAKANA VOICED SOUND MARK
static final char uff9f = '\uff9f'; // ゚ - HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
static final char u3001 = '\u3001'; // 、 - IDEOGRAPHIC COMMA
static final char u3002 = '\u3002'; // 。 - IDEOGRAPHIC FULL STOP
static final char uff0c = '\uff0c'; // , - FULLWIDTH COMMA
static final char uff0e = '\uff0e'; // . - FULLWIDTH FULL STOP
static final char uff1a = '\uff1a'; // : - FULLWIDTH COLON
static final char uff1b = '\uff1b'; // ; - FULLWIDTH SEMICOLON
static final char uff1f = '\uff1f'; // ? - FULLWIDTH QUESTION MARK
static final char uff01 = '\uff01'; // ! - FULLWIDTH EXCLAMATION MARK
static final char u309b = '\u309b'; // ゛ - KATAKANA-HIRAGANA VOICED SOUND MARK
static final char u309c = '\u309c'; // ゜ - KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
static final char u30fd = '\u30fd'; // ヽ - KATAKANA ITERATION MARK
static final char u30fe = '\u30fe'; // ヾ - KATAKANA VOICED ITERATION MARK
static final char u309d = '\u309d'; // ゝ - HIRAGANA ITERATION MARK
static final char u309e = '\u309e'; // ゞ - HIRAGANA VOICED ITERATION MARK
static final char u3005 = '\u3005'; // 々 - IDEOGRAPHIC ITERATION MARK
static final char u30fc = '\u30fc'; // ー - KATAKANA-HIRAGANA PROLONGED SOUND MARK
static final char u2019 = '\u2019'; // ’ - RIGHT SINGLE QUOTATION MARK
static final char u201d = '\u201d'; // ” - RIGHT DOUBLE QUOTATION MARK
static final char u3015 = '\u3015'; // 〕 - RIGHT TORTOISE SHELL BRACKET
static final char uff3d = '\uff3d'; // ] - FULLWIDTH RIGHT SQUARE BRACKET
static final char uff5d = '\uff5d'; // } - FULLWIDTH RIGHT CURLY BRACKET
static final char u3009 = '\u3009'; // 〉 - RIGHT ANGLE BRACKET
static final char u300b = '\u300b'; // 》 - RIGHT DOUBLE ANGLE BRACKET
static final char u300f = '\u300f'; // 』 - RIGHT WHITE CORNER BRACKET
static final char u3011 = '\u3011'; // 】 - RIGHT BLACK LENTICULAR BRACKET
static final char u00b0 = '\u00b0'; // ° - DEGREE SIGN
static final char u2032 = '\u2032'; // ′ - PRIME
static final char u2033 = '\u2033'; // ″ - DOUBLE PRIME
static final char u2103 = '\u2103'; // ℃ - DEGREE CELSIUS
static final char u00a2 = '\u00a2'; // ¢ - CENT SIGN
static final char uff05 = '\uff05'; // % - FULLWIDTH PERCENT SIGN
static final char u2030 = '\u2030'; // ‰ - PER MILLE SIGN
static final char u3041 = '\u3041'; // ぁ - HIRAGANA LETTER SMALL A
static final char u3043 = '\u3043'; // ぃ - HIRAGANA LETTER SMALL I
static final char u3045 = '\u3045'; // ぅ - HIRAGANA LETTER SMALL U
static final char u3047 = '\u3047'; // ぇ - HIRAGANA LETTER SMALL E
static final char u3049 = '\u3049'; // ぉ - HIRAGANA LETTER SMALL O
static final char u3063 = '\u3063'; // っ - HIRAGANA LETTER SMALL TU
static final char u3083 = '\u3083'; // ゃ - HIRAGANA LETTER SMALL YA
static final char u3085 = '\u3085'; // ゅ - HIRAGANA LETTER SMALL YU
static final char u3087 = '\u3087'; // ょ - HIRAGANA LETTER SMALL YO
static final char u308e = '\u308e'; // ゎ - HIRAGANA LETTER SMALL WA
static final char u30a1 = '\u30a1'; // ァ - KATAKANA LETTER SMALL A
static final char u30a3 = '\u30a3'; // ィ - KATAKANA LETTER SMALL I
static final char u30a5 = '\u30a5'; // ゥ - KATAKANA LETTER SMALL U
static final char u30a7 = '\u30a7'; // ェ - KATAKANA LETTER SMALL E
static final char u30a9 = '\u30a9'; // ォ - KATAKANA LETTER SMALL O
static final char u30c3 = '\u30c3'; // ッ - KATAKANA LETTER SMALL TU
static final char u30e3 = '\u30e3'; // ャ - KATAKANA LETTER SMALL YA
static final char u30e5 = '\u30e5'; // ュ - KATAKANA LETTER SMALL YU
static final char u30e7 = '\u30e7'; // ョ - KATAKANA LETTER SMALL YO
static final char u30ee = '\u30ee'; // ヮ - KATAKANA LETTER SMALL WA
static final char u30f5 = '\u30f5'; // ヵ - KATAKANA LETTER SMALL KA
static final char u30f6 = '\u30f6'; // ヶ - KATAKANA LETTER SMALL KE
static final char[] NOT_BEGIN_CHARACTERS = new char[]{u30fb, u2022, uff65, u300d, uff09, u0021, u0025, u0029, u002c,
u002e, u003f, u005d, u007d, uff61, uff63, uff64, uff67, uff68, uff69, uff6a, uff6b, uff6c, uff6d, uff6e,
uff6f, uff70, uff9e, uff9f, u3001, u3002, uff0c, uff0e, uff1a, uff1b, uff1f, uff01, u309b, u309c, u30fd,
u30fe, u309d, u309e, u3005, u30fc, u2019, u201d, u3015, uff3d, uff5d, u3009, u300b, u300f, u3011, u00b0,
u2032, u2033, u2103, u00a2, uff05, u2030, u3041, u3043, u3045, u3047, u3049, u3063, u3083, u3085, u3087,
u308e, u30a1, u30a3, u30a5, u30a7, u30a9, u30c3, u30e3, u30e5, u30e7, u30ee, u30f5, u30f6, u2060};
// a line of text cannot end with any following characters in NOT_ENDING_CHARACTERS[]
static final char u0024 = '\u0024'; // $ - DOLLAR SIGN
static final char u0028 = '\u0028'; // ( - LEFT PARENTHESIS
static final char u005b = '\u005b'; // [ - LEFT SQUARE BRACKET
static final char u007b = '\u007b'; // { - LEFT CURLY BRACKET
static final char u00a3 = '\u00a3'; // £ - POUND SIGN
static final char u00a5 = '\u00a5'; // ¥ - YEN SIGN
static final char u201c = '\u201c'; // “ - LEFT DOUBLE QUOTATION MARK
static final char u2018 = '\u2018'; // ‘ - LEFT SINGLE QUOTATION MARK
static final char u300a = '\u300a'; // 《 - LEFT DOUBLE ANGLE BRACKET
static final char u3008 = '\u3008'; // 〈 - LEFT ANGLE BRACKET
static final char u300c = '\u300c'; // 「 - LEFT CORNER BRACKET
static final char u300e = '\u300e'; // 『 - LEFT WHITE CORNER BRACKET
static final char u3010 = '\u3010'; // 【 - LEFT BLACK LENTICULAR BRACKET
static final char u3014 = '\u3014'; // 〔 - LEFT TORTOISE SHELL BRACKET
static final char uff62 = '\uff62'; // 「 - HALFWIDTH LEFT CORNER BRACKET
static final char uff08 = '\uff08'; // ( - FULLWIDTH LEFT PARENTHESIS
static final char uff3b = '\uff3b'; // [ - FULLWIDTH LEFT SQUARE BRACKET
static final char uff5b = '\uff5b'; // { - FULLWIDTH LEFT CURLY BRACKET
static final char uffe5 = '\uffe5'; // ¥ - FULLWIDTH YEN SIGN
static final char uff04 = '\uff04'; // $ - FULLWIDTH DOLLAR SIGN
static final char[] NOT_ENDING_CHARACTERS = new char[]{u0024, u0028, u005b, u007b, u00a3, u00a5, u201c, u2018, u3008,
u300a, u300c, u300e, u3010, u3014, uff62, uff08, uff3b, uff5b, uffe5, uff04, u2060};
/**
* An instance of the jpSplitCharacter.
*/
public static final JapaneseSplitCharacter SplitCharacter = new JapaneseSplitCharacter();
/**
* An instance DefaultSplitCharacter used for BasicLatin characters.
*/
private static final SplitCharacter defaultSplitCharacter = new DefaultSplitCharacter();
public JapaneseSplitCharacter() { }
/**
* Custom method to for SplitCharacter to handle Japanese characters.
* Returns <CODE>true</CODE> if the character can split a line. The splitting implementation
* is free to look ahead or look behind characters to make a decision.
*
* @param start the lower limit of <CODE>cc</CODE> inclusive
* @param current the pointer to the character in <CODE>cc</CODE>
* @param end the upper limit of <CODE>cc</CODE> exclusive
* @param cc an array of characters at least <CODE>end</CODE> sized
* @param ck an array of <CODE>PdfChunk</CODE>. The main use is to be able to call
* {@link PdfChunk#getUnicodeEquivalent(int)}. It may be <CODE>null</CODE>
* or shorter than <CODE>end</CODE>. If <CODE>null</CODE> no conversion takes place.
* If shorter than <CODE>end</CODE> the last element is used
* @return <CODE>true</CODE> if the character(s) can split a line
*/
public boolean isSplitCharacter(int start, int current, int end, char[] cc, PdfChunk[] ck) {
// Note: If you don't add an try/catch iText and there is an issue with isSplitCharacter() silently fails and
// you have no idea there was a problem.
try {
char charCurrent = getCharacter(current, cc, ck);
int next = current + 1;
if (next < cc.length) {
char charNext = getCharacter(next, cc, ck);
for (char not_begin_character : NOT_BEGIN_CHARACTERS) {
if (charNext == not_begin_character) {
return false;
}
}
}
for (char not_ending_character : NOT_ENDING_CHARACTERS) {
if (charCurrent == not_ending_character) {
return false;
}
}
boolean isBasicLatin = Character.UnicodeBlock.of(charCurrent) == Character.UnicodeBlock.BASIC_LATIN;
if (isBasicLatin)
return defaultSplitCharacter.isSplitCharacter(start, current, end, cc, ck);
return true;
} catch (Exception ex) {
ex.printStackTrace();
}
return true;
}
/**
* Returns a character int the array (Note: modified from the iText default version with the addition null
* check of '|| ck[Math.min(position, ck.length - 1)] == null'.
*
* @param position position in the array
* @param ck chunk array
* @param cc the character array that has to be checked
* @return the character
*/
protected char getCharacter(int position, char[] cc, PdfChunk[] ck) {
if (ck == null || ck[Math.min(position, ck.length - 1)] == null) {
return cc[position];
}
return (char) ck[Math.min(position, ck.length - 1)].getUnicodeEquivalent(cc[position]);
}
}
你知道如何解决问题,当一行写满,然后中文标点符号将放在下一行的开头,如(1)所示吗?实际上,我们希望标点符号放在每行的末尾,如(2)所示。 (1) 非常感谢您事先的帮助!
我有一个由另一个人生成的孤立提交,我在GitHub GUI中看到了它。克隆存储库无法访问此提交,因为路径上没有包含此提交的分支/标记。假设提交的作者已经删除了他的本地存储库,我希望直接在GitHub中为这个提交分配一个分支/标记。 这个提交可以通过我可以使用的直接链接看到,因为它是已知的git哈希。在洞察中看不到promise 如何在GitHub中从此提交页面为提交分配标签?是否可以直接从此页面创
问题内容: 我有3个表MySQL(MyIsam): 如何删除所有没有消息也没有archivedMessage的用户? 问题答案: 您可以使用:
我正在创建一个模型,该模型引用第三方包——芹菜(Crontab时间表和周期任务)中的模型。我的模型(让我们称之为SchduledRun)将包含一个周期任务的外键。 我知道,如果我删除外键本身,就会发生级联删除,引用该外键的父级也会被删除。(除非在删除时被覆盖…) 但由于我将ScheduledRun指向PeriodicTask的FK,所以在删除ScheduledRun时,PeriodicTask不会
我在处理数据。带有“from”和“to”列的框架,我想从中创建网络图。 例如: 该 NA 的存在会产生错误。 如果我只是删除NA行,则不会绘制孤独节点。 我希望得到与以下相同的结果: 但是使用两列,从和到。我如何得到相同的结果?当“从”或“到”中的任何一个为NA时,只需绘制没有边缘且不产生错误的节点。
问题内容: 我有一个html: 我想按顺序获取所有文本,例如以下数组: 问题答案: 我将使用一种递归方法,该方法采用您的开始标记并对其子节点进行迭代。对于每个TextNode,打印内容。对于每个元素,检查它的子节点。 输出量