当前位置: 首页 > 知识库问答 >
问题:

Apache Beam Maven依赖项错误

堵存
2023-03-14

我试图使用来自Java的Apache Beam作为某种数据管道。我写了一个简单的类,从谷歌Pubsub和下沉到谷歌Bigquery,但我不能让它为我的生活构建。我使用Maven构建并添加了我能找到的所有Beam包,但我仍然得到“未找到类文件”的错误。

具体来说:

[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[28,16] cannot access org.apache.beam.sdk.options.GcpOptions
  class file for org.apache.beam.sdk.options.GcpOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[29,16] cannot access org.apache.beam.sdk.options.BigQueryOptions
  class file for org.apache.beam.sdk.options.BigQueryOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[31,16] cannot access org.apache.beam.sdk.options.GcsOptions
  class file for org.apache.beam.sdk.options.GcsOptions not found

有人知道我需要添加哪些包来解决这些问题吗?不幸的是,谷歌没有提供帮助。

我的POM文件是基于Apache为Wordcount提供的示例POM,但添加了额外的依赖项。下面是我放入其中的依赖项。我可以提供完整的文件,如果需要,但它是相当单一的。

<dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-apex</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <!--
                  Apex depends on httpclient version 4.3.5, project has a transitive dependency to httpclient 4.0.1 from
                  google-http-client. Apex dependency version being specified explicitly so that it gets picked up. This
                  can be removed when the project no longer has a dependency on a different httpclient version.
                -->
                <dependency>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpclient</artifactId>
                    <version>4.3.5</version>
                    <scope>runtime</scope>
                    <exclusions>
                        <exclusion>
                            <groupId>commons-codec</groupId>
                            <artifactId>commons-codec</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>dataflow-runner</id>
            <!-- Makes the DataflowRunner available when running a pipeline. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>flink-runner</id>
            <!-- Makes the FlinkRunner available when running a pipeline. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-flink_2.10</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>spark-runner</id>
            <!-- Makes the SparkRunner available when running a pipeline. Additionally,
                 overrides some Spark dependencies to Beam-compatible versions. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-spark</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-sdks-java-io-hadoop-file-system</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-streaming_2.10</artifactId>
                    <version>${spark.version}</version>
                    <scope>runtime</scope>
                    <exclusions>
                        <exclusion>
                            <groupId>org.slf4j</groupId>
                            <artifactId>jul-to-slf4j</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>
                <dependency>
                    <groupId>com.fasterxml.jackson.module</groupId>
                    <artifactId>jackson-module-scala_2.10</artifactId>
                    <version>${jackson.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>
    </profiles>

    <dependencies>
        <!-- Adds a dependency on the Beam SDK. -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-core</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-google-cloud-platform -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-fn-api -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-fn-api</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-google-cloud-platform-core -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-common -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-common</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-gcp-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-gcp-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-extensions-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-parent -->
        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>


        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-reference -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-reference</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-build-tools -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-build-tools</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-direct-java -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-direct-java</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-core-construction-java -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-core-construction-java</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
            <version>[2.1.0, 2.99)</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-runner-api -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-runner-api</artifactId>
            <version>2.2.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
            <version>0.4.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.api-client</groupId>
            <artifactId>google-api-client</artifactId>
            <version>${google-clients.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.apis</groupId>
            <artifactId>google-api-services-bigquery</artifactId>
            <version>${bigquery.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.http-client</groupId>
            <artifactId>google-http-client</artifactId>
            <version>${google-clients.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.apis</groupId>
            <artifactId>google-api-services-pubsub</artifactId>
            <version>${pubsub.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>${joda.version}</version>
        </dependency>

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>${guava.version}</version>
        </dependency>

        <!-- Add slf4j API frontend binding with JUL backend -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-jdk14</artifactId>
            <version>${slf4j.version}</version>
            <!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
            <scope>runtime</scope>
        </dependency>

        <!-- Hamcrest and JUnit are required dependencies of PAssert,
             which is used in the main code of DebuggingWordCount example. -->
        <dependency>
            <groupId>org.hamcrest</groupId>
            <artifactId>hamcrest-all</artifactId>
            <version>${hamcrest.version}</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit.version}</version>
        </dependency>

    </dependencies>

共有1个答案

沈俊美
2023-03-14

这些类:

org.apache.beam.sdk.options.GcpOptions
org.apache.beam.sdk.options.GcsOptions
org.apache.beam.sdk.options.BigQueryOptions

...都在早期版本的Apache Beam中。

给定pom.xml中的依赖项(特别是Apache Beam V2.2.0上的依赖项),正确的导入是:

org.apache.beam.sdk.extensions.gcp.options.GcpOptions 
org.apache.beam.sdk.extensions.gcp.options.GcsOptions 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
 类似资料: