目录
全量加载历史备份数据,条数 1亿条数据,全部导出占用磁盘450G左右
导出过程报一下错误
日志报错内容如下:
2023-03-15 21:25:39.715 [288235-0-0-writer] INFO OdpsWriterProxy - write block 1584 ok.
2023-03-15 21:25:42.374 [288235-0-0-reader] ERROR MongoDBReader$Task - operation exceeded time limit
com.mongodb.MongoExecutionTimeoutException: operation exceeded time limit
at com.mongodb.internal.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:243) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:175) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:293) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:255) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:444) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:200) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:123) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:222) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:115) ~[mongodb-driver-core-3.9.0.jar:na]
at com.mongodb.client.internal.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:54) ~[mongodb-driver-sync-3.9.0.jar:na]
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:186) ~[mongodbreader-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:116) [datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:834) [na:1.8.0_112]
2023-03-15 21:25:42.376 [288235-0-0-reader] ERROR ReaderRunner - Reader runner Received Exceptions:
com.alibaba.datax.common.exception.DataXException: operation exceeded time limit
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:364) ~[mongodbreader-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:116) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:834) [na:1.8.0_112]
Exception in thread "taskGroup-0" com.alibaba.datax.common.exception.DataXException: operation exceeded time limit
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30)
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:364)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:116)
at java.lang.Thread.run(Thread.java:834)
2023-03-15 21:25:49.519 [job-288235] INFO MetricReportUtil - reportJobMetric is turn off
2023-03-15 21:25:49.519 [job-288235] INFO LocalJobContainerCommunicator - Total 20456992 records, 93828811021 bytes | Speed 25.31MB/s, 7500 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 171.858s | All Task WaitReaderTime 4,009.398s | Percentage 0.00%
2023-03-15 21:25:49.525 [job-288235] ERROR JobContainer - 运行scheduler 模式[local]出错.
2023-03-15 21:25:49.526 [job-288235] ERROR JobContainer - Exception when job run
com.alibaba.datax.common.exception.DataXException: operation exceeded time limit
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na]
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:364) ~[na:na]
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:116) ~[datax-core-0.0.1-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:834) ~[na:1.8.0_112]
2023-03-15 21:25:49.534 [job-288235] INFO MetricReportUtil - reportJobMetric is turn off
2023-03-15 21:25:49.534 [job-288235] INFO LocalJobContainerCommunicator - Total 20456992 records, 93828811021 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 171.858s | All Task WaitReaderTime 4,009.398s | Percentage 0.00%
2023-03-15 21:25:49.534 [job-288235] INFO JobContainer - jobContainer starts to do destroy ...
2023-03-15 21:25:49.534 [job-288235] INFO JobContainer - DataX Writer.Job [odpswriter] do destroy work.
2023-03-15 21:25:49.534 [job-288235] INFO JobContainer - DataX Reader.Job [mongodbreader] do destroy work.
2023-03-15 21:25:49.535 [job-288235] ERROR Engine -
Through the intelligent analysis by DataX, the most likely error reason of this task is:
com.alibaba.datax.common.exception.DataXException: operation exceeded time limit
at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30)
at com.alibaba.datax.plugin.reader.mongodbreader.MongoDBReader$Task.startRead(MongoDBReader.java:364)
at com.alibaba.datax.core.taskgroup.runner.ReaderRunner.run(ReaderRunner.java:116)
at java.lang.Thread.run(Thread.java:834)
2023-03-15 21:25:49 INFO =================================================================
2023-03-15 21:25:49 INFO Exit code of the Shell command 1
2023-03-15 21:25:49 INFO --- Invocation of Shell command completed ---
2023-03-15 21:25:49 ERROR Shell run failed!
2023-03-15 21:25:49 ERROR Current task status: ERROR
2023-03-15 21:25:49 INFO Cost time is: 4883.857s
/home/admin/alisatasknode/taskinfo//20230315/diide/20/04/21/uait7kghrsy0gg1w8qksa6ay/T3_0000910947.log-END-EOF
2023-03-15 21:25:55 : Detail log url: http://cdp.res.tkbjdmp.yun/outter/pipeline/basecommon_sys_default/job/288235/log?requestor=999999999
Return with failed!!
2023-03-15 21:25:55 [INFO] Sandbox context cleanup temp file success.
2023-03-15 21:25:55 [INFO] Data synchronization ended with return code: [1].
2023-03-15 21:25:55 INFO =================================================================
2023-03-15 21:25:55 INFO Exit code of the Shell command 1
2023-03-15 21:25:55 INFO --- Invocation of Shell command completed ---
2023-03-15 21:25:55 ERROR Shell run failed!
2023-03-15 21:25:55 ERROR Current task status: ERROR
2023-03-15 21:25:55 INFO Cost time is: 4894.629s
/home/admin/alisatasknode/taskinfo//20230315/phoenix/20/04/17/sqbh1leuhspi41gdw8bv6by4/T3_0000910946.log-END-EOF
https://developer.aliyun.com/ask/473431
1. 按现在这个报错operation exceeded time
limit,建议用户将cursorTimeoutInMs调大到3600000。
2. 目前这个任务已经跑了20亿数据了,用户应该是10亿多数据,可以再确认下数据量多少。
3. batchSize可先保持1000不动,如果调整完cursorTimeoutInMs不报错,可再尝试调大batchSize。
我这边实施的1,已解决。
下面让我们了解下MongoDB Reader的一些参数:
参数 | 描述 |
datasource | 数据源名称,脚本模式支持添加数据源,此配置项填写的内容必须要与添加的数据源名称保持一致。 |
collectionName | MonogoDB的集合名。 |
hint | MongoDB支持hint参数,使查询优化器使用特定索引来完成查询,在某些情况下,可以提高查询性能。详情请参见hint参数。示例如下: |
column | MongoDB的文档列名,配置为数组形式表示MongoDB的多个列。 name:column的名字。 type支持的类型包括: string:表示字符串。 long:表示整型数。 double表示浮点数。 date表示日期。 bool表示布尔值。 bytes:表示二进制序列。 arrays:以JSON字符串格式读出,例如["a","b","c"]。 array:以分隔符splitter分隔的方式读出,例如a,b,c,推荐使用arrays格式。 combine使用MongoDB Reader插件读出数据时,支持合并MongoDB document中的多个字段为一个JSON串。 splitter:因为MongoDB支持数组类型,但数据集成框架本身不支持数组类型,所以MongoDB读出来的数组类型,需要通过该分隔符合并成字符串。 |
batchSize | 批量获取的记录数,该参数为选填参数。默认值为1000 条。 |
cursorTimeoutInMs | 游标超时时间,该参数为选填参数。默认值为
|
query | 您可以通过该配置型来限制返回MongoDB数据范围,仅支持以下时间格式,不支持直接使用时间戳类型的格式。 说明 query不支持JS语法。 常用query示例如下:
说明 更多MongoDB的查询语法请参见MongoDB官方文档。 |
splitFactor | 如果存在比较严重的数据倾斜,可以考虑增加splitFactor,实现更小粒度的切分,无需增加并发数。 |
更多内容请前往阿里云官方文档: