【spark】REFRESH TABLE tableName

田昊天
2023-12-01

REFRESH TABLE tableName

java.io.FileNotFoundException: File does not exist: hdfs://service/user/hive/warehouse/bdc_dws.db/dws_day_org_pro_size_sal_ds/partition_day=2022-03-08/dc4619ee5be097cd-bd68b0eb00000073_1137555274_data.0.parq
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running ‘REFRESH TABLE tableName’ command in SQL or by recreating the Dataset/DataFrame involved.

22/05/19 11:40:43 INFO scheduler.TaskSetManager: Finished task 305.0 in stage 18.0 (TID 95273) in 2187 ms on cdh051.prd.bjds..lan (executor 12) (34/534)
22/05/19 11:40:43 INFO scheduler.TaskSetManager: Starting task 603.0 in stage 24.0 (TID 95329, bjm8-bdc-cdh-prd-10-251-35-215..lan, executor 7, partition 603, RACK_LOCAL, 5079 bytes)
22/05/19 11:40:43 INFO scheduler.TaskSetManager: Finished task 602.0 in stage 24.0 (TID 95326) in 317 ms on bjm8-bdc-cdh-prd-10-251-35-215..lan (executor 7) (600/246600)
22/05/19 11:40:43 INFO scheduler.TaskSetManager: Starting task 39.0 in stage 19.0 (TID 95330, bjm8-bdc-cdh-prd-10-251-35-216..lan, executor 15, partition 39, RACK_LOCAL, 6568 bytes)
22/05/19 11:40:43 WARN scheduler.TaskSetManager: Lost task 15.2 in stage 19.0 (TID 95256, bjm8-bdc-cdh-prd-10-251-35-216..lan, executor 15): java.io.FileNotFoundException: File does not exist: hdfs://service/user/hive/warehouse/bdc_dws.db/dws_day_org_pro_size_sal_ds/partition_day=2022-03-08/dc4619ee5be097cd-bd68b0eb00000073_1137555274_data.0.parq
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

 类似资料: