当前位置: 首页 > 面试题库 >

从Java中的另一个字符串中删除字符串

邴姚石
2023-03-14
问题内容

可以说我有这个单词列表:

 String[] stopWords = new String[]{"i","a","and","about","an","are","as","at","be","by","com","for","from","how","in","is","it","not","of","on","or","that","the","this","to","was","what","when","where","who","will","with","the","www"};

比我有文字

 String text = "I would like to do a nice novel about nature AND people"

是否有匹配stopWords并在忽略大小写时将其删除的方法;像这样的地方?:

 String noStopWordsText = remove(text, stopWords);

结果:

 " would like do nice novel nature people"

如果您了解正则表达式,效果很好,但我真的更喜欢像Commons解决方案这样的东西,它更注重性能

顺便说一句,现在我正在使用此通用方法,该方法缺少适当的不区分大小写的处理:

 private static final String[] stopWords = new String[]{"i", "a", "and", "about", "an", "are", "as", "at", "be", "by", "com", "for", "from", "how", "in", "is", "it", "not", "of", "on", "or", "that", "the", "this", "to", "was", "what", "when", "where", "who", "will", "with", "the", "www", "I", "A", "AND", "ABOUT", "AN", "ARE", "AS", "AT", "BE", "BY", "COM", "FOR", "FROM", "HOW", "IN", "IS", "IT", "NOT", "OF", "ON", "OR", "THAT", "THE", "THIS", "TO", "WAS", "WHAT", "WHEN", "WHERE", "WHO", "WILL", "WITH", "THE", "WWW"};
 private static final String[] blanksForStopWords = new String[]{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""};

 noStopWordsText = StringUtils.replaceEach(text, stopWords, blanksForStopWords);

问题答案:

这是不使用正则表达式的解决方案。我认为它不如我的其他答案,因为它更长且不清楚,但是如果性能确实非常重要,那么这就是 O(n) ,其中 n
是文本的长度。

Set<String> stopWords = new HashSet<String>();
stopWords.add("a");
stopWords.add("and");
// and so on ...

String sampleText = "I would like to do a nice novel about nature AND people";
StringBuffer clean = new StringBuffer();
int index = 0;

while (index < sampleText.length) {
  // the only word delimiter supported is space, if you want other
  // delimiters you have to do a series of indexOf calls and see which
  // one gives the smallest index, or use regex
  int nextIndex = sampleText.indexOf(" ", index);
  if (nextIndex == -1) {
    nextIndex = sampleText.length - 1;
  }
  String word = sampleText.substring(index, nextIndex);
  if (!stopWords.contains(word.toLowerCase())) {
    clean.append(word);
    if (nextIndex < sampleText.length) {
      // this adds the word delimiter, e.g. the following space
      clean.append(sampleText.substring(nextIndex, nextIndex + 1)); 
    }
  }
  index = nextIndex + 1;
}

System.out.println("Stop words removed: " + clean.toString());


 类似资料:
  • 问题内容: 从字符串中删除最后一个字符的最快方法是什么? 我有一个像 我想删除最后一个’,’并取回剩下的字符串: 最快的方法是什么? 问题答案: 首先,我尝试没有空格,并得到一个错误结果。 然后,我添加一个空格并获得良好的结果:

  • 使用扫描仪,我想读取char的索引,然后将其从字符串中删除。只有一个问题:如果char在字符串中出现多次,那么。替换()将删除所有这些字符。 例如,我想从字符串“纹理文本”中获取第一个“t”的索引,然后只删除那个“t”。然后我想得到第二个't'的索引,然后删除它。

  • 问题内容: 为了访问Java中String的各个字符,我们有。是否有任何内置函数来删除Java中String的单个字符? 像这样: 问题答案: 你也可以使用可变的类。 它具有方法deleteCharAt(),以及许多其他mutator方法。 只需删除需要删除的字符,然后得到结果,如下所示: 这样可以避免创建不必要的字符串对象。

  • 问题内容: 我有一个包含非ASCII字符的URI,例如: http://www.abc.de/qq/qq.ww?MIval=typo3_bsl_int_Smtliste&p_smtbez=Schmalbl -ttrigeSomerzischeruchtanb 如何从此URI中删除“ …” 问题答案: 我猜想URL的来源更多是错误的。也许您正在解决错误的问题?从URI中删除“奇怪”字符可能会赋予它完

  • 问题内容: 删除字符串的前三个字符的最有效方法是什么? 例如: 问题答案: 只需使用子字符串:将返回

  • 问题内容: 我有这样的字符串 我想删除,并从。我希望结果是。我怎样才能做到这一点? 问题答案: 正则表达式与replaceAll。 如果您只想在成对时删除\ r \ n(上面的代码删除了\ r或\ n),请执行以下操作: