我有两个问题。第一)是如何包含两个字母之间带有点的单词,比如“C.J.Johnson”;第二)是是否可以创建包含点的单词列表,我的regex将包括它们?基本上,我想用单词搜索文本文件,并列出所有包含这些单词的句子。我的代码:
public void search_sentences() throws FileNotFoundException, IOException {
//FileReader fr1 = new FileReader(get_File());
BufferedReader br1 = new BufferedReader(new InputStreamReader(new FileInputStream(get_File()), "UTF-8"));
ArrayList<String> words = new ArrayList();
PrintWriter writer = new PrintWriter("rivit.txt", "UTF-8");
String str="";
//String [] words = {};
String sanat = get_Text();
for(String w: sanat.split(", ")){
words.add(w);
}
String word_re = words.get(0);
for (int i = 1; i < words.size(); i++)
word_re += "|" + words.get(i);
word_re = "[^.!?]*\\b(" + word_re + ")\\b[^.!?]*[.!?]";
while(br1.ready()) { str += br1.readLine(); }
Pattern re = Pattern.compile(word_re,
Pattern.MULTILINE | Pattern.COMMENTS |
Pattern.CASE_INSENSITIVE);
Matcher match = re.matcher(str);
String sentenceString="";
while (match .find()) {
sentenceString = match.group(0);
if(!txtFile.isSelected()){
tekstiAlue.append(sentenceString);
} else {
writer.println(sentenceString);
}
}
writer.close();
}
我认为第一个问题是可行的。Ive尝试将//s添加到
正如Jeff Holt在评论中所说,“查看pattern.quote()
”:
[...]生成一个字符串,该字符串可用于创建与字符串s
相匹配的模式,就像它是文字模式一样。
输入序列中的元字符或转义序列将没有特殊含义。
public static List<String> findSentencesContaining(String fullText, String word, String[] specials) {
Pattern p = buildRegexToFindSentencesContaining(word, specials);
List<String> sentences = new ArrayList<>();
for (Matcher m = p.matcher(fullText); m.find(); )
sentences.add(m.group().replaceAll("\\s+", " ").trim()); // normalize group of whitespace into a single space
return sentences;
}
public static Pattern buildRegexToFindSentencesContaining(String word, String[] specials) {
StringJoiner regexText = new StringJoiner("|", "(?:", "|[^.!?])*").setEmptyValue("[^.!?]*");
for (String s : specials)
regexText.add(toWordRegex(s));
String regex = regexText + toWordRegex(word) + regexText + "[.!?]";
return Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
}
private static String toWordRegex(String word) {
String regex = Pattern.quote(word);
if (word.matches("\\b.*"))
regex = "\\b" + regex;
if (word.matches(".*\\b"))
regex = regex + "\\b";
return regex;
}
String fullText = "This is a test. We're testing that sentences\n" +
"can span multiple lines, i.e. that line" +
"terminators can appear in a sentence. We're\n" +
"also testing that sentences can contain\n" +
"special words containing sentence-ending\n" +
"\"words\", e.g. \"i.e.\" and \"etc.\". In\n" +
"addition, (special) word matching is\n" +
"case-insensitive.";
String[] specials = { "i.e.", "e.g.", "etc." };
for (String word : new String[] { "test", "also", "we're", "is", "happy" }) {
System.out.println("Sentences containing word \"" + word + "\":");
List<String> sentences = findSentencesContaining(fullText, word, specials);
if (sentences.isEmpty())
System.out.println(" ** NOT FOUND");
else {
for (String sentence : sentences)
System.out.println(" " + sentence);
}
}
输出
Sentences containing word "test":
This is a test.
Sentences containing word "also":
We're also testing that sentences can contain special words containing sentence-ending "words", e.g. "i.e." and "etc.".
Sentences containing word "we're":
We're testing that sentences can span multiple lines, i.e. that lineterminators can appear in a sentence.
We're also testing that sentences can contain special words containing sentence-ending "words", e.g. "i.e." and "etc.".
Sentences containing word "is":
This is a test.
In addition, (special) word matching is case-insensitive.
Sentences containing word "happy":
** NOT FOUND
我对Xpath有问题。我试图查找div第一次迭代的所有文本节点,但排除其中包含关键字的节点。 一个简单的例子: 我想从第一个div“blabla”中获取所有文本,但排除所有包含“bananas”一词的段落。在这种情况下,我只想要“我也喜欢苹果”。段落数和单词“bananas”的位置是随机的。 以下是我尝试过的: 我不知道为什么这样不行。如果有人有想法,我们将不胜感激!
问题内容: 我需要一个选择,它会返回如下结果: 我需要所有结果,即这包括带有’word2 word3 word1’或’word1 word3 word2’或三者的任何其他组合的字符串。 所有单词都必须包含在结果中。 问题答案: 相当慢,但是可以包括 任何 单词的工作方法: 如果您需要 所有 单词出现,请使用以下命令: 如果您想要更快的速度,则需要研究全文搜索,这对于每种数据库类型都是非常特定的。
我有一个包含50000个单词的单词列表,还有一个逐行查找字母字符的txt文件。我试图通过按顺序阅读单词列表中的单词来找到包含7个不同字母的单词,我为此编写了一个方法。 首先,我浏览单词并同步字符列表,然后通过导航字母txt文件在单词中相互检查,如果有,则增加计数器。通过这种方式,我试图了解单词中有多少不同的字母,最后,如果它提供了控制,我会将其添加到列表中。 读取txt文件并返回哈希集。 但它不是
问题内容: 那么,如何检查字符串中是否包含特定单词? 这是我的代码: 我遇到了错误。 问题答案: 并不像他们所说的那么复杂,选中此选项您不会后悔。 您可以根据需要更改。
我的文件:syn.txt 一切都很好,除了同义词: 我做了一些研究,我发现了以下几点: 所以我试图改变我的配置文件,并在索引中添加过滤器,但它不起作用。 什么东西有什么想法吗?
怎么了?各位! 我正在尝试拦截所有名称中包含特定单词的类...如下所示: 我有以下拦截方法: 我试过:(有效,但看起来很可怕) 谢谢!!!