当前位置: 首页 > 知识库问答 >
问题:

值“name”和“姓氏”不读apache poi

楚鸿波
2023-03-14

我的目的是读取一个文件docx,并使用文本“#name#”和“#cushe#”并用另一个随意的文本更改值:

我这样做:

XWPFDocument docx = new XWPFDocument(OPCPackage.open("..."));
  
            for (XWPFParagraph p : docx.getParagraphs()) {
                List<XWPFRun> runs = p.getRuns();
            
                if (runs != null) {
                    
                    for (XWPFRun r : runs) {
                        String text = r.getText(0);
                        if (text != null && text.startsWith("#") && text.endsWith("#")) {
                            text = text.replace("#", "new ");
                            r.setText(text, 0);
                        }
                      
                    }
                }
                
            }
            for (XWPFTable tbl : docx.getTables()) {
                   for (XWPFTableRow row : tbl.getRows()) {
                      for (XWPFTableCell cell : row.getTableCells()) {
                         for (XWPFParagraph p : cell.getParagraphs()) {
                   

          
                        for (XWPFRun r : p.getRuns()) {
                          String text = r.getText(0);
                          if (text != null && text.startsWith("#") && text.endsWith("#")) {
                            text = text.replace("#", "new ");
                            r.setText(text,0);
                          }
                        }
                     }
                  }
               }
    

问题是,我的代码读取docx文件中的所有标签,但不读取标签“#cushe#”和“#name”。有人能帮我吗?

共有1个答案

梁华清
2023-03-14

从截图上看,“#name#”和“#suremane#”不是直接在文档正文中,而是在绘图中(例如文本框或形状)。XWPFDocument.getParages.getTablesApache POI中的任何其他高级方法都不包括这些元素。因此,您的主要问题将是包含文本的段落没有被您的代码遍历。

真正从文档主体中获取所有段落的唯一方法是使用XMLCursor,它直接从XML中选择所有W:P元素。

下面的代码显示。它使用XMLCursor遍历文档正文中的所有XWPFGARDES,如果找到文本,则替换文本。

对于替换过程,我更喜欢Apache POI中显示的textsegment替换方法:${my_placeholder}已经被视为三个不同的运行。这是必要的,因为即使遍历了包含的段落,由于格式、拼写检查或任何其他奇怪的原因,文本也可能在不同的文本运行中被分离。Microsoft Word知道将文本奇怪地拆分为不同的文本运行的几乎无限的原因。

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.ArrayList;

public class WordReplaceTextSegment {
    
    /**
     * this methods parse the paragraph and search for the string searched.
     * If it finds the string, it will return true and the position of the String
     * will be saved in the parameter startPos.
     *
     * @param searched
     * @param startPos
     */
    static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
        int startRun = startPos.getRun(),
            startText = startPos.getText(),
            startChar = startPos.getChar();
        int beginRunPos = 0, candCharPos = 0;
        boolean newList = false;

        //CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
        java.util.List<XWPFRun> runs = paragraph.getRuns(); 
        
        int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
        
        //for (int runPos = startRun; runPos < rArray.length; runPos++) {
        for (int runPos = startRun; runPos < runs.size(); runPos++) {
            //int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
            int textPos = 0, charPos;
            //CTR ctRun = rArray[runPos];
            CTR ctRun = runs.get(runPos).getCTR();
            XmlCursor c = ctRun.newCursor();
            c.selectPath("./*");
            try {
                while (c.toNextSelection()) {
                    XmlObject o = c.getObject();
                    if (o instanceof CTText) {
                        if (textPos >= startText) {
                            String candidate = ((CTText) o).getStringValue();
                            if (runPos == startRun) {
                                charPos = startChar;
                            } else {
                                charPos = 0;
                            }

                            for (; charPos < candidate.length(); charPos++) {
                                if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
                                    beginTextPos = textPos;
                                    beginCharPos = charPos;
                                    beginRunPos = runPos;
                                    newList = true;
                                }
                                if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
                                    if (candCharPos + 1 < searched.length()) {
                                        candCharPos++;
                                    } else if (newList) {
                                        TextSegment segment = new TextSegment();
                                        segment.setBeginRun(beginRunPos);
                                        segment.setBeginText(beginTextPos);
                                        segment.setBeginChar(beginCharPos);
                                        segment.setEndRun(runPos);
                                        segment.setEndText(textPos);
                                        segment.setEndChar(charPos);
                                        return segment;
                                    }
                                } else {
                                    candCharPos = 0;
                                }
                            }
                        }
                        textPos++;
                    } else if (o instanceof CTProofErr) {
                        c.removeXml();
                    } else if (o instanceof CTRPr) {
                        //do nothing
                    } else {
                        candCharPos = 0;
                    }
                }
            } finally {
                c.dispose();
            }
        }
        return null;
    }

 static void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
  TextSegment foundTextSegment = null;
  PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
  //while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
  html" target="_blank">while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) { // search all text segments having text to find

System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());

   // maybe there is text before textToFind in begin run
   XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
   String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
   String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before

   // maybe there is text after textToFind in end run
   XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
   String textInEndRun = endRun.getText(foundTextSegment.getEndText());
   String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after

   if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) { 
    textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
   } else {
    textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
    endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
   }

   beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());

   // runs between begin run and end run needs to be removed
   for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
    paragraph.removeRun(runBetween); // remove not needed runs
   }

  }
 }
 
 static List<XmlObject> getCTPObjects(XWPFDocument doc) {
  List<XmlObject> result = new ArrayList<XmlObject>();
  //create cursor selecting all paragraph elements  
  XmlCursor cursor = doc.getDocument().newCursor();
  cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:p");  
  while(cursor.hasNextSelection()) {
   cursor.toNextSelection();
   XmlObject obj = cursor.getObject();    
   // add only if the paragraph contains at least a run containing text
   if (obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' ./w:r/w:t").length > 0) {
    result.add(obj);   
   }
  }
  return result;
 }
 
 static void traverseAllParagraphsAndReplace(XWPFDocument doc, Map<String, String> replacements) throws Exception { 
  //This gets all XWPFParagraph out od the stored XML and replaces 
  //first get all CTP objects
  List<XmlObject> allCTPObjects = getCTPObjects(doc);
  //then traverse them and create XWPFParagraphs from them and do the replacing
  for (XmlObject obj : allCTPObjects) {
   XWPFParagraph paragraph = null;
   if (obj instanceof CTP) {
    CTP p = (CTP)obj;
    paragraph = new XWPFParagraph(p, doc);
   } else {
    CTP p = CTP.Factory.parse(obj.xmlText());  
    paragraph = new XWPFParagraph(p, doc);
   }
   if (paragraph != null) {
    for (String textToFind : replacements.keySet()) {
     String replacement = replacements.get(textToFind);
     if (paragraph.getText().contains(textToFind)) replaceTextSegment(paragraph, textToFind, replacement);
    }
   }
   obj.set(paragraph.getCTP());
  }   
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
  
  Map<String, String> replacements;
  replacements = new HashMap<String, String>();
  replacements.put("#name#", "Axel");
  replacements.put("#surename#", "Richter");

  traverseAllParagraphsAndReplace(doc, replacements);

  FileOutputStream out = new FileOutputStream("result.docx");
  doc.write(out);
  out.close();
  doc.close();

 }
}
 类似资料:
  • 我需要帮助来分别获得名字和姓氏。 字段为[全名]:Halsey S Dunn 我可以单独获得名字,但无法单独获得姓氏: 这是我的名字和姓氏代码: 我的结果需要是:

  • 本文向大家介绍使用JavaScript拆分名字和姓氏?,包括了使用JavaScript拆分名字和姓氏?的使用技巧和注意事项,需要的朋友参考一下 假设以下是我们的名称字符串- 使用分割名字和姓氏。以下是代码- 示例 要运行上述程序,您需要使用以下命令- 输出结果 在这里,我的文件名为demo163.js。这将产生以下输出-

  • 问题内容: 我有一个客户列表,其名称为全名。我想创建一个将全名作为参数并分别返回名字和姓氏的函数。如果不可能,我可以有两个单独的函数,一个返回名字,另一个返回姓氏。全名列表包含最多三个单词的名称。我想要的是: 全名由两个词组成时。第一个应该是名称,第二个应该是姓氏。 当一个全名由三个词组成时。第一个单词和中间单词应为名字,第三个单词应为姓氏。 例子:- 结果:- 我搜索并找到了无法按预期运行的解决

  • 我有两个字段FirstName和LastName存储在MongoDB中。从前端,我接收到一个字符串,包含名字和姓氏,用空格分隔,我需要一个查找查询,搜索名字和姓氏的组合。

  • 我有一张名为“有益”的表。关于它的一些事实: 受益人属于一个组织 一个组织有许多有益的 受益人有名字和姓氏,没有其他身份证明形式 表中的一些示例数据 我想通过名字和姓氏查找来自某个组织的有益信息是否也存在于其他组织中,如果是,我想获取组织ID。 在上面的示例数据中,我想要的是给定组织id,查询应该返回,因为对组织也有好处但不是因为即使它们匹配名字,它们也不匹配姓氏 我提出了以下问题: 它有点工作,

  • 在这段代码中,只根据姓氏进行排序。我需要做的是将姓氏作为第一级排序,将名字作为第二级排序。 我希望姓氏作为第一级排序,名字作为第二级排序