使用 Pyspark 使用 withColumn() 命令,以便在数据帧上执行一些基本转换,即更新列的值。寻找一些调试帮助,同时我也解决了这个问题。
Pyspark正在发布分析异常
_c49=“EVENT_NARRATIVE”是与哥伦布(“EVENT_NARRATIVE”)...引用火花 df(数据帧)内的数据元素。
from pyspark.sql.functions import *
from pyspark.sql.types import *
df = df.withColumn('EVENT_NARRATIVE', lower(col('EVENT_NARRATIVE')))
Py4JJavaError: An error occurred while calling o100.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`EVENT_NARRATIVE`' given input columns: [_c3, _c17, _c40, _c21, _c48, _c12, _c39, _c18, _c31, _c10, _c45, _c26, _c5, _c43, _c24, _c33, _c9, _c14, _c1, _c16, _c47, _c20, _c46, _c32, _c22, _c7, _c2, _c42, _c37, _c36, _c30, _c8, _c38, _c23, _c25, _c13, _c29, _c41, _c19, _c44, _c11, _c28, _c6, _c50, _c49, _c0, _c15, _c4, _c34, _c27, _c35];;
'Project [_c0#604, _c1#605, _c2#606, _c3#607, _c4#608, _c5#609, _c6#610, _c7#611, _c8#612, _c9#613, _c10#614, _c11#615, _c12#616, _c13#617, _c14#618, _c15#619, _c16#620, _c17#621, _c18#622, _c19#623, _c20#624, _c21#625, _c22#626, _c23#627, ... 28 more fields]
+- Relation[_c0#604,_c1#605,_c2#606,_c3#607,_c4#608,_c5#609,_c6#610,_c7#611,_c8#612,_c9#613,_c10#614,_c11#615,_c12#616,_c13#617,_c14#618,_c15#619,_c16#620,_c17#621,_c18#622,_c19#623,_c20#624,_c21#625,_c22#626,_c23#627,... 27 more fields] csv
df.head()中的1行示例数据:
[Row(_c0='BEGIN_YEARMONTH', _c1='BEGIN_DAY', _c2='BEGIN_TIME', _c3='END_YEARMONTH', _c4='END_DAY', _c5='END_TIME', _c6='EPISODE_ID', _c7='EVENT_ID', _c8='STATE', _c9='STATE_FIPS', _c10='YEAR', _c11='MONTH_NAME', _c12='EVENT_TYPE', _c13='CZ_TYPE', _c14='CZ_FIPS', _c15='CZ_NAME', _c16='WFO', _c17='BEGIN_DATE_TIME', _c18='CZ_TIMEZONE', _c19='END_DATE_TIME', _c20='INJURIES_DIRECT', _c21='INJURIES_INDIRECT', _c22='DEATHS_DIRECT', _c23='DEATHS_INDIRECT', _c24='DAMAGE_PROPERTY', _c25='DAMAGE_CROPS', _c26='SOURCE', _c27='MAGNITUDE', _c28='MAGNITUDE_TYPE', _c29='FLOOD_CAUSE', _c30='CATEGORY', _c31='TOR_F_SCALE', _c32='TOR_LENGTH', _c33='TOR_WIDTH', _c34='TOR_OTHER_WFO', _c35='TOR_OTHER_CZ_STATE', _c36='TOR_OTHER_CZ_FIPS', _c37='TOR_OTHER_CZ_NAME', _c38='BEGIN_RANGE', _c39='BEGIN_AZIMUTH', _c40='BEGIN_LOCATION', _c41='END_RANGE', _c42='END_AZIMUTH', _c43='END_LOCATION', _c44='BEGIN_LAT', _c45='BEGIN_LON', _c46='END_LAT', _c47='END_LON', _c48='EPISODE_NARRATIVE', _c49='EVENT_NARRATIVE', _c50='DATA_SOURCE'),
Row(_c0='201210', _c1='29', _c2='1600', _c3='201210', _c4='29', _c5='1922', _c6='68680', _c7='416744', _c8='NEW HAMPSHIRE', _c9='33', _c10='2012', _c11='October', _c12='High Wind', _c13='Z', _c14='12', _c15='EASTERN HILLSBOROUGH', _c16='BOX', _c17='29-OCT-12 16:00:00', _c18='EST-5', _c19='29-OCT-12 19:22:00', _c20='0', _c21='0', _c22='0', _c23='0', _c24='109.60K', _c25='0.00K', _c26='ASOS', _c27='55.00', _c28='MG', _c29=None, _c30=None, _c31=None, _c32=None, _c33=None, _c34=None, _c35=None, _c36=None, _c37=None, _c38=None, _c39=None, _c40=None, _c41=None, _c42=None, _c43=None, _c44=None, _c45=None, _c46=None, _c47=None, _c48='Sandy, a hybrid storm with both tropical and extra-tropical characteristics, brought high winds and coastal flooding to southern New England. Easterly winds gusted to 50 to 60 mph for interior southern New England; 55 to 65 mph along the eastern Massachusetts coast and along the I-95 corridor in southeast Massachusetts and Rhode Island; and 70 to 80 mph along the southeast Massachusetts and Rhode Island coasts. A few higher higher gusts occurred along the Rhode Island coast. A severe thunderstorm embedded in an outer band associated with Sandy produced wind gusts to 90 mph and concentrated damage in Wareham early Tuesday evening, |a day after the center of Sandy had moved into New Jersey. In general, moderate coastal flooding occurred along the Massachusetts coastline, and major coastal flooding impacted the Rhode Island coastline. The storm surge was generally 2.5 to 4.5 feet along the east coast of Massachusetts, but peaked late Monday afternoon in between high tide cycles. Seas built to between 20 and 25 feet Monday afternoon and evening just off the Massachusetts east coast. Along the south coast, the storm surge was 4 to 6 feet and seas from 30 to a little over 35 feet were observed in the outer coastal waters. The very large waves on top of the storm surge caused destructive coastal flooding along stretches of the Rhode Island exposed south coast. ||Sandy grew into a hurricane over the southwest Caribbean and then headed north across Jamaica, Cuba, and the Bahamas. As Sandy headed north of the Bahamas, the storm interacted with a vigorous weather system moving west to east across the United States and began to take on a hybrid structure. Strong high pressure over southeast Canada helped with the expansion of the strong winds well north of the center of Sandy. In essence, Sandy retained the structure of a hurricane near its center (until shortly before landfall) while taking on more of an extra-tropical cyclone configuration well away from the center. Sandy���s track was unusual. The storm headed northeast and then north across the western Atlantic and then sharply turned to the west to make landfall near Atlantic City, NJ during Monday evening. Sandy subsequently weakened and moved west across southern Pennsylvania on Tuesday before turning north and heading across western New York state into Quebec during Tuesday night and Wednesday.', _c49='The Automated Surface Observing System at Manchester-Boston Regional Airport (KMHT) recorded sustained wind speeds of 38 mph and gusts to 63 mph. In Manchester, a tree was downed on Harrison Street. In Hudson, a tree was downed on Lawrence Road, bringing down wires that sparked a fire that damaged a house. In Merrimack, a tree was downed, taking down wires and closing Amherst Road from Meetinghouse Road to Riverside Drive. In Nashua, a tree was downed onto a house on Broad Street, near the Hollils line. No structural damage was found. Numerous trees were downed, blocking roads.', _c50='CSV')
列名的形式为_c
后接数字,因为在读取输入文件时,您可能没有指定header=True
。你能做到
df = spark.read.csv('filepath', header=True)
这样列名将是BEGIN_YEARMONTH
,BEGIN_DAY
,… etc,而不是_c0
,_c1
,…,然后您的with Colzo
代码应该可以工作。
还可以考虑添加< code>inferSchema=True来确保数据类型是合适的。
当然,您也可以坚持使用当前代码,并
df2 = df.withColumn('_c49', lower(col('_c49')))
但这不是一个好的长期解决方案。列名应该是合理的,而且您也不希望标题是数据帧中的行之一。
我正试图修改PySpark dataframe中的列值,如下所示: 这将生成以下异常: 调用O435时出错。跟踪:py4j.py4jException:Method或([class java.lang.string])在py4j.reflection.reflectionEngine.getMethod(reflectionEngine.java:318)在py4j.reflection.refl
本文向大家介绍C++与C的差异分析,包括了C++与C的差异分析的使用技巧和注意事项,需要的朋友参考一下 虽说C++是向后兼容C的,但C++与C还是存在许多差异。本文列举了几个例子加以说明,同时这些也是我们非常容易忽略的地方。本文仅简单的列举几例,更多的不同之处读者还需要在学习与实践中不断的进行发掘和总结。 C编译通过但C++编译不通过: 1、C++中编译器不允许在一个函数声明之前调用它,但C中编译
本文向大家介绍java异常机制分析,包括了java异常机制分析的使用技巧和注意事项,需要的朋友参考一下 本文实例分析了Java的异常机制,分享给大家供大家参考。相信有助于大家提高大家Java程序异常处理能力。具体分析如下: 众所周知,java中的异常(Exception)机制很重要,程序难免会出错,异常机制可以捕获程序中的错误,用来提高程序的稳定性和健壮性。 java中的异常分为Checked E
本文向大家介绍基于TransactionTooLargeException异常分析,包括了基于TransactionTooLargeException异常分析的使用技巧和注意事项,需要的朋友参考一下 异常的关键字是:android.view.InfiateException:Binary XML file line #11:Error infiating class 官方文档里的解释是,Binde
本文向大家介绍tomcat加载jar异常问题的分析与解决,包括了tomcat加载jar异常问题的分析与解决的使用技巧和注意事项,需要的朋友参考一下 现象描述: 项目使用springboot启动一个web项目,在启动阶段看到console中出现了异常“1.10.3-1.4.3\hdf5.jar 系统找不到指定的文件”,虽然这些异常不影响项目的正常运行,但作为一个严谨的技术人员,看到这些异常就像见到
本文向大家介绍.NET中的异常和异常处理用法分析,包括了.NET中的异常和异常处理用法分析的使用技巧和注意事项,需要的朋友参考一下 本文较为详细的分析了.NET中的异常和异常处理用法。分享给大家供大家参考。具体分析如下: .NET中的异常(Exception) .net中的中异常的父类是Exception,大多数异常一般继承自Exception。 可以通过编写一个继承自Exception的类的方式