这是我第一次使用SAXParser(我在Android中使用它,但我认为这对这个特定问题没有影响),并且我正尝试从RSS
feed中读取数据。到目前为止,它在很大程度上对我来说非常有用,但是当它到达包含HTML编码文本(例如<a href="http://...
)的标签时,我遇到了麻烦。该characters()
方法仅将读<
为<
,然后将下一组字符视为一个单独的实体,而不是立即获取全部内容。我希望它按原样阅读,而无需实际翻译HTML。我在文档处理程序中使用的代码(缩短了)如下所示:
@Override
public void startElement(String uri, String localName, String qName, Attributes attrs) throws SAXException {
if (localName.equalsIgnoreCase("channel")) {
inChannel = true;
}
if (inChannel) {
if (newFeed == null) newFeed = new Feed();
if (localName.equalsIgnoreCase("image")) {
if (feedImage == null) feedImage = new Image();
inImage = true;
}
if (localName.equalsIgnoreCase("item")) {
if (newItem == null) newItem = new Item();
if (itemList == null) itemList = new ArrayList<Item>();
inItem = true;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(!inItem) {
if(!inImage) {
if(inChannel) {
//Reached end of feed
if(localName.equalsIgnoreCase("channel")) {
newFeed.setItems((ArrayList<Item>)itemList);
finalFeed = newFeed;
newFeed = null;
inChannel = false;
return;
} else if(localName.equalsIgnoreCase("title")) {
newFeed.setTitle(currentValue); return;
} else if(localName.equalsIgnoreCase("link")) {
newFeed.setLink(currentValue); return;
} else if(localName.equalsIgnoreCase("description")) {
newFeed.setDescription(currentValue); return;
} else if(localName.equalsIgnoreCase("language")) {
newFeed.setLanguage(currentValue); return;
} else if(localName.equalsIgnoreCase("copyright")) {
newFeed.setCopyright(currentValue); return;
} else if(localName.equalsIgnoreCase("category")) {
newFeed.addCategory(currentValue); return;
}
}
}
else { //is inImage
//finished with feed image
if(localName.equalsIgnoreCase("image")) {
newFeed.setImage(feedImage);
feedImage = null;
inImage = false;
return;
} else if (localName.equalsIgnoreCase("url")) {
feedImage.setUrl(currentValue); return;
} else if (localName.equalsIgnoreCase("title")) {
feedImage.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
feedImage.setLink(currentValue); return;
}
}
}
else { //is inItem
//finished with news item
if (localName.equalsIgnoreCase("item")) {
itemList.add(newItem);
newItem = null;
inItem = false;
return;
} else if (localName.equalsIgnoreCase("title")) {
newItem.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
newItem.setLink(currentValue); return;
} else if (localName.equalsIgnoreCase("description")) {
newItem.setDescription(currentValue); return;
} else if (localName.equalsIgnoreCase("author")) {
newItem.setAuthor(currentValue); return;
} else if (localName.equalsIgnoreCase("category")) {
newItem.addCategory(currentValue); return;
} else if (localName.equalsIgnoreCase("comments")) {
newItem.setComments(currentValue); return;
} /*else if (localName.equalsIgnoreCase("enclosure")) {
To be implemented later
}*/ else if (localName.equalsIgnoreCase("guid")) {
newItem.setGuid(currentValue); return;
} else if (localName.equalsIgnoreCase("pubDate")) {
newItem.setPubDate(currentValue); return;
}
}
}
@Override
public void characters(char[] ch, int start, int length) {
currentValue = new String(ch, start, length);
}
我要解析的RSS feed的一个例子就是这个。
有任何想法吗?
万一它对任何人都有帮助,我可以通过对我对数据感兴趣的每个字段使用布尔值来解决此问题。然后,我继续追加到StringBuilder,直到到达结束标记,此后,我获取StringBuilder值,然后将其清空,并将布尔值设置为false。
@Override
public void startElement(String uri, String localName, String qName, Attributes attrs) throws SAXException {
sb.delete(0, sb.length());
if (localName.equalsIgnoreCase("channel")) {
inChannel = true;
newFeed = new Feed();
itemList = new ArrayList<Item>();
}
if (inChannel) {
if (localName.equalsIgnoreCase("image")) {
feedImage = new Image();
inImage = true;
return;
}
else if (localName.equalsIgnoreCase("item")) {
newItem = new Item();
inItem = true;
return;
}
if(inImage) { //set booleans for image elements
if (localName.equalsIgnoreCase("title")) imgTitle = true;
else if (localName.equalsIgnoreCase("link")) imgLink = true;
else if (localName.equalsIgnoreCase("url")) imgURL = true;
return;
}
else if(inItem) { //set booleans for item elements
if (localName.equalsIgnoreCase("title")) iTitle = true;
else if (localName.equalsIgnoreCase("link")) iLink = true;
else if (localName.equalsIgnoreCase("description")) iDescription = true;
else if (localName.equalsIgnoreCase("author")) iAuthor = true;
else if (localName.equalsIgnoreCase("category")) iCategory = true;
else if (localName.equalsIgnoreCase("comments")) iComments = true;
else if (localName.equalsIgnoreCase("guid")) iGuid = true;
else if (localName.equalsIgnoreCase("pubdate")) iPubDate= true;
else if (localName.equalsIgnoreCase("source")) iSource = true;
return;
} else { //set booleans for channel elements
if (localName.equalsIgnoreCase("title")) fTitle = true;
else if (localName.equalsIgnoreCase("link")) fLink = true;
else if (localName.equalsIgnoreCase("description")) fDescription = true;
else if (localName.equalsIgnoreCase("language")) fLanguage= true;
else if (localName.equalsIgnoreCase("copyright")) fCopyright = true;
else if (localName.equalsIgnoreCase("category")) fCategory = true;
return;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(inChannel) {
if(inImage) {
if (localName.equalsIgnoreCase("title")) {
feedImage.setTitle(sb.toString());
sb.delete(0, sb.length());
imgTitle = false;
return;
}
else if (localName.equalsIgnoreCase("link")) {
feedImage.setLink(sb.toString());
sb.delete(0, sb.length());
imgLink = false;
return;
}
else if (localName.equalsIgnoreCase("url")) {
feedImage.setUrl(sb.toString());
sb.delete(0, sb.length());
imgURL = false;
return;
}
else return;
}
else if(inItem) {
if (localName.equalsIgnoreCase("item")) {
itemList.add(newItem);
newItem = null;
inItem = false;
return;
} else if (localName.equalsIgnoreCase("title")) {
newItem.setTitle(sb.toString());
sb.delete(0, sb.length());
iTitle = false;
return;
} else if (localName.equalsIgnoreCase("link")) {
newItem.setLink(sb.toString());
sb.delete(0, sb.length());
iLink = false;
return;
} else if (localName.equalsIgnoreCase("description")) {
newItem.setDescription(sb.toString());
sb.delete(0, sb.length());
iDescription = false;
return;
} else if (localName.equalsIgnoreCase("author")) {
newItem.setAuthor(sb.toString());
sb.delete(0, sb.length());
iAuthor = false;
return;
} else if (localName.equalsIgnoreCase("category")) {
newItem.addCategory(sb.toString());
sb.delete(0, sb.length());
iCategory = false;
return;
} else if (localName.equalsIgnoreCase("comments")) {
newItem.setComments(sb.toString());
sb.delete(0, sb.length());
iComments = false;
return;
} /*else if (localName.equalsIgnoreCase("enclosure")) {
To be implemented later
}*/ else if (localName.equalsIgnoreCase("guid")) {
newItem.setGuid(sb.toString());
sb.delete(0, sb.length());
iGuid = false;
return;
} else if (localName.equalsIgnoreCase("pubDate")) {
newItem.setPubDate(sb.toString());
sb.delete(0, sb.length());
iPubDate = false;
return;
}
}
else {
if(localName.equalsIgnoreCase("channel")) {
newFeed.setItems((ArrayList<Item>)itemList);
finalFeed = newFeed;
newFeed = null;
inChannel = false;
return;
} else if(localName.equalsIgnoreCase("title")) {
newFeed.setTitle(currentValue);
sb.delete(0, sb.length());
fTitle = false;
return;
} else if(localName.equalsIgnoreCase("link")) {
newFeed.setLink(currentValue);
sb.delete(0, sb.length());
fLink = false;
return;
} else if(localName.equalsIgnoreCase("description")) {
newFeed.setDescription(sb.toString());
sb.delete(0, sb.length());
fDescription = false;
return;
} else if(localName.equalsIgnoreCase("language")) {
newFeed.setLanguage(currentValue);
sb.delete(0, sb.length());
fLanguage = false;
return;
} else if(localName.equalsIgnoreCase("copyright")) {
newFeed.setCopyright(currentValue);
sb.delete(0, sb.length());
fCopyright = false;
return;
} else if(localName.equalsIgnoreCase("category")) {
newFeed.addCategory(currentValue);
sb.delete(0, sb.length());
fCategory = false;
return;
}
}
}
}
@Override
public void characters(char[] ch, int start, int length) {
sb.append(new String(ch, start, length));
}
我试图检索xml值从一列数据存储在xml格式使用下面的查询在OracleSQLDeveloper.但是得到这个错误: ORA-00932:不一致的数据类型:应为-got-00932。00000-“不一致的数据类型:应为%s获得%s”*原因: 列的数据类型为CLOB 如:
Examples var parser = new tinymce.html.SaxParser({ validate: true, comment: function(text) { console.log('Comment:', text); }, cdata: function(text) { console.log('CD
我的PHP文件没有将值从网站传递到我的电子邮件。 index.html中的表单代码块: CSS: PHP文件: 邮件中的输出: “来自- 姓名: 电子邮件: 联系人: 消息:' [未获得在我的网站'center2enter.com'上提交的变量的值] 我试过: > $body=“发件人:($_post['name'])\n电子邮件:($_post['email'])\n联系人:($_post['c
我正在尝试从Win32 ListView控件(SysListView32)检索项目信息(文本就足够了)。我正在使用JNA的sendMessage()发送lvm_getItemText。SendMessage()获取指向LVITEM结构的指针,该结构如下所示(http://msdn.microsoft.com/en-us/library/windows/desktop/bb774760(v=vs.8
我很难使用XSLT(使用氧气运行转换)从UTF-8编码的XML源生成ISO-8859-1编码的文本输出。 例如,当源包含破折号时,oxyow会抛出一个错误,“输出字符在此编码中不可用(十进制8211)”。 有简单明了的解决办法吗? 谢谢你的任何建议。
问题内容: 我有成千上万个包含多个JSON对象的文本文件,但是不幸的是,这些对象之间没有分隔符。对象存储为字典,它们的某些字段本身就是对象。每个对象可能具有可变数量的嵌套对象。具体来说,一个对象可能看起来像这样: 并在文本文件中串联了数百个这样的对象而没有分隔符。这意味着我既不能使用也不可以。 关于如何解决此问题的任何建议。是否有已知的解析器可以执行此操作? 问题答案: 这将从字符串中解码您的JS