可以说我有这个单词列表:
String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};
比我有文字
String text = "I would like to do a nice novel about nature AND people"
是否有匹配stopWords并在忽略大小写时将其删除的方法;像这样的地方?:
String noStopWordsText = remove(text, stopWords);
结果:
" would like do nice novel nature people"
如果您了解正则表达式,效果很好,但我真的更喜欢像Commons解决方案这样的东西,它更注重性能。
顺便说一句,现在我正在使用此通用方法,该方法缺少适当的不区分大小写的处理:
private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};
noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);
这是不使用正则表达式的解决方案。我认为它不如我的其他答案,因为它更长且不清楚,但是如果性能确实非常重要,那么这就是 O(n) ,其中 n
是文本的长度。
Set<String> stopWords = new HashSet<String>();
stopWords.add("a");
stopWords.add("and");
// and so on ...
String sampleText = "I would like to do a nice novel about nature AND people";
StringBuffer clean = new StringBuffer();
int index = 0;
while (index < sampleText.length) {
// the only word delimiter supported is space, if you want other
// delimiters you have to do a series of indexOf calls and see which
// one gives the smallest index, or use regex
int nextIndex = sampleText.indexOf(" ", index);
if (nextIndex == -1) {
nextIndex = sampleText.length - 1;
}
String word = sampleText.substring(index, nextIndex);
if (!stopWords.contains(word.toLowerCase())) {
clean.append(word);
if (nextIndex < sampleText.length) {
// this adds the word delimiter, e.g. the following space
clean.append(sampleText.substring(nextIndex, nextIndex + 1));
}
}
index = nextIndex + 1;
}
System.out.println("Stop words removed: " + clean.toString());
问题内容: 从字符串中删除最后一个字符的最快方法是什么? 我有一个像 我想删除最后一个’,’并取回剩下的字符串: 最快的方法是什么? 问题答案: 首先,我尝试没有空格,并得到一个错误结果。 然后,我添加一个空格并获得良好的结果:
使用扫描仪,我想读取char的索引,然后将其从字符串中删除。只有一个问题:如果char在字符串中出现多次,那么。替换()将删除所有这些字符。 例如,我想从字符串“纹理文本”中获取第一个“t”的索引,然后只删除那个“t”。然后我想得到第二个't'的索引,然后删除它。
问题内容: 为了访问Java中String的各个字符,我们有。是否有任何内置函数来删除Java中String的单个字符? 像这样: 问题答案: 你也可以使用可变的类。 它具有方法deleteCharAt(),以及许多其他mutator方法。 只需删除需要删除的字符,然后得到结果,如下所示: 这样可以避免创建不必要的字符串对象。
问题内容: 我有一个包含非ASCII字符的URI,例如: http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl -ttrigeSomerzischeruchtanb 如何从此URI中删除“ …” 问题答案: 我猜想URL的来源更多是错误的。也许您正在解决错误的问题?从URI中删除“奇怪”字符可能会赋予它完
问题内容: 删除字符串的前三个字符的最有效方法是什么? 例如: 问题答案: 只需使用子字符串:将返回
问题内容: 我有这样的字符串 我想删除,并从。我希望结果是。我怎样才能做到这一点? 问题答案: 正则表达式与replaceAll。 如果您只想在成对时删除\ r \ n(上面的代码删除了\ r或\ n),请执行以下操作: