问题：

有没有办法从给定的PowerPoint文件中准确地获取演讲者笔记与阿帕奇poi？

仲孙兴旺

2023-03-14

我正在尝试使用apache poi将扬声器笔记从一个powerpoint传输到另一个powerpoint，但我无法获得准确的传输。

环顾四周，我找不到很多资源。我确实找到了这个链接：如何使用apachepoi获取pptx幻灯片笔记文本，它在大多数情况下都有效。但当原始pptx中包含一些功能（如幻灯片母版）时，一些不属于演讲者注释的文本将被解释为演讲者备注。

XSLFNotes notes_src = slides_src[i].getNotes();
XSLFNotes notes_dst = ppt_dst.getNotesSlide(slides_dst[i]);

这些都在一个for循环中，I是迭代次数。在这里，我从源文件中获取幻灯片I，从目标文件中获取相应的幻灯片I。

for (XSLFShape shape_src : notes_src) {
    if (shape_src instanceof XSLFTextShape) {
        XSLFTextShape txShape = (XSLFTextShape) shape_src;
        for (XSLFTextParagraph xslfParagraph : txShape.getTextParagraphs()) {

在这里，我从幻灯片中获取文本。下面的if循环是我必须开始过滤一些“演讲者”笔记的地方，这些笔记实际上不是演讲者笔记（例如，幻灯片编号以某种方式被解释为笔记；还有这个打印的版权符号）。

    if (!(xslfParagraph.getText().startsWith("" + (i + 1)) & xslfParagraph.getText().length() < 3) & !(xslfParagraph.getText().startsWith("Copyright ©"))) {
        for (XSLFTextShape shape_dst : notes_dst.getPlaceholders()) {
            if (shape_dst.getTextType() == Placeholder.BODY) {
                shape_dst.setText(shape_dst.getText() + xslfParagraph.getText() + "\n");

下面的陈述是另一个过滤器；如果涉及到主幻灯片的html" target="_blank">功能，一段奇怪的“点击编辑主文本样式…”文本也将被解释为演讲者注释。

    shape_dst.setText(shape_dst.getText().replace("Click to edit Master text styles", "").replace("Second level", "").replace("Third level", "").replace("Fourth level", "").replace("Fifth level", ""));
}}}}}}

简而言之，不是演讲者笔记的东西显示为“笔记”。网上关于这个主题的资源不多；有人能帮忙吗？

徐阳炎

2023-03-14

笔记得到的是笔记幻灯片。这些可能不仅包含包含注释的正文文本形状，而且还具有通过其他占位符（如页眉，页脚，日期时间和幻灯片编号）填充的文本形状。要确定获得的文本形状类型，可以从形状中获取占位符类型。这是

CTShape cTShape = (CTShape)shape.getXmlObject(); 
STPlaceholderType.Enum type = cTShape.getNvSpPr().getNvPr().getPh().getType();

然后，人们只能获得 ST 占位符类型.BODY 类型的文本形状。

例：

import java.io.FileInputStream;

import org.apache.poi.xslf.usermodel.*;

import org.openxmlformats.schemas.presentationml.x2006.main.CTShape;
import org.openxmlformats.schemas.presentationml.x2006.main.STPlaceholderType;

import java.util.List;

public class PowerPointReadNotes {

 public static void main(String[] args) throws Exception {

  XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("PowerPointHavingNotes.pptx"));

  List<XSLFSlide> slides = slideShow.getSlides();
  for (XSLFSlide slide : slides) {
   XSLFNotes notes = slide.getNotes();
   for (XSLFShape shape : notes) {
    CTShape cTShape = (CTShape)shape.getXmlObject();
    STPlaceholderType.Enum type = cTShape.getNvSpPr().getNvPr().getPh().getType();
    System.out.println("type: " + type); 
    if (type == STPlaceholderType.BODY) { // get only shapes of type BODY
     if (shape instanceof XSLFTextShape) {
      XSLFTextShape textShape = (XSLFTextShape) shape;
      for (XSLFTextParagraph paragraph : textShape) {
       System.out.println(paragraph.getText());
      }
     }
    }
   }
  }
 }
}

可能的类型包括正文、图表、CLIP_ART、CTR_TITLE、数字几何、数字孪生、FTR、HDR、媒体、OBJ、PIC、SLD_IMG、SLD_NUM、SUB_TITLE、TBL、标题。

不幸的是，没有任何关于ooxml模式的公开文档。所以我们需要下载ooxml-schemas的源代码，然后使用这些源代码来获得描述类和方法的API文档。

然后我们在那里找到＜code＞org.openxmlformats.schemas.presentationml.x2006.main*类，这些类是OfficeOpenXML的表示部分的类。可以查看/org/openxmlformats/schemas/presentationml/x2006/main/CTShape。html＜/code＞中的＜code＞API＜/code＞文档，该文档由＜code＞javadoc＜/code>创建，然后前进＜code＞getNvSpPr（）＜/code>-＜code＞get NvPr（（）＜/code＞-＜code＞getPh（）＜/code＞-＞code＞get Type（）＜/code＞。

使用当前的< code>apache poi 4.1.0时，高级< code>API中有一个枚举占位符也可以使用。

例：

import java.io.FileInputStream;

import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.Placeholder;

import java.util.List;

public class PowerPointReadNotesHL {

 public static void main(String[] args) throws Exception {

  XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("PowerPointHavingNotes.pptx"));

  List<XSLFSlide> slides = slideShow.getSlides();
  for (XSLFSlide slide : slides) {
   XSLFNotes notes = slide.getNotes();
   for (XSLFShape shape : notes) {
    Placeholder placeholder = shape.getPlaceholder();
    System.out.println("placeholder: " + placeholder); 
    if (placeholder == Placeholder.BODY) { // get only shapes of type BODY
     if (shape instanceof XSLFTextShape) {
      XSLFTextShape textShape = (XSLFTextShape) shape;
      for (XSLFTextParagraph paragraph : textShape) {
       System.out.println(paragraph.getText());
      }
     }
    }
   }
  }
 }
}

然后，不需要直接使用低级 ooxml 架构类。

有没有办法从给定的PowerPoint文件中准确地获取演讲者笔记与阿帕奇poi？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档