当前位置: 首页 > 文档资料 > Pandas 官方教程 >

Pandas 秘籍 - 第三章

优质
小牛编辑
116浏览
2023-12-01
  1. # 通常的开头
  2. import pandas as pd
  3. # 使图表更大更漂亮
  4. pd.set_option('display.mpl_style', 'default')
  5. figsize(15, 5)
  6. # 始终展示所有列
  7. pd.set_option('display.line_width', 5000)
  8. pd.set_option('display.max_columns', 60)

让我们继续 NYC 311 服务请求的例子。

  1. complaints = pd.read_csv('../data/311-service-requests.csv')

3.1 仅仅选择噪音投诉

我想知道哪个区有最多的噪音投诉。 首先,我们来看看数据,看看它是什么样子:

  1. complaints[:5]
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
02658965110/31/2013 02:08:41 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1143290-03 169 STREET169 STREET90 AVENUE91 AVENUENaNNaNADDRESSJAMAICANaNPrecinctAssigned10/31/2013 10:08:41 AM10/31/2013 02:35:17 AM12 QUEENSQUEENS1042027197389UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.708275-73.791604(40.70827532593202, -73.79160395779721)
12659369810/31/2013 02:01:04 AMNaNNYPDNew York City Police DepartmentIllegal ParkingCommercial Overnight ParkingStreet/Sidewalk1137858 AVENUE58 AVENUE58 PLACE59 STREETNaNNaNBLOCKFACEMASPETHNaNPrecinctOpen10/31/2013 10:01:04 AMNaN05 QUEENSQUEENS1009349201984UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.721041-73.909453(40.721040535628305, -73.90945306791765)
22659413910/31/2013 02:00:24 AM10/31/2013 02:40:32 AMNYPDNew York City Police DepartmentNoise - CommercialLoud Music/PartyClub/Bar/Restaurant100324060 BROADWAYBROADWAYWEST 171 STREETWEST 172 STREETNaNNaNADDRESSNEW YORKNaNPrecinctClosed10/31/2013 10:00:24 AM10/31/2013 02:39:42 AM12 MANHATTANMANHATTAN1001088246531UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.843330-73.939144(40.84332975466513, -73.93914371913482)
32659572110/31/2013 01:56:23 AM10/31/2013 02:21:48 AMNYPDNew York City Police DepartmentNoise - VehicleCar/Truck HornStreet/Sidewalk10023WEST 72 STREETWEST 72 STREETCOLUMBUS AVENUEAMSTERDAM AVENUENaNNaNBLOCKFACENEW YORKNaNPrecinctClosed10/31/2013 09:56:23 AM10/31/2013 02:21:10 AM07 MANHATTANMANHATTAN989730222727UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.778009-73.980213(40.7780087446372, -73.98021349023975)
42659093010/31/2013 01:53:44 AMNaNDOHMHDepartment of Health and Mental HygieneRodentCondition Attracting RodentsVacant Lot10027WEST 124 STREETWEST 124 STREETLENOX AVENUEADAM CLAYTON POWELL JR BOULEVARDNaNNaNBLOCKFACENEW YORKNaNN/APending11/30/2013 01:53:44 AM10/31/2013 01:59:54 AM10 MANHATTANMANHATTAN998815233545UnspecifiedMANHATTANUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.807691-73.947387(40.80769092704951, -73.94738703491433)

为了得到噪音投诉,我们需要找到Complaint Type列为Noise - Street/Sidewalk的行。 我会告诉你如何做,然后解释发生了什么。

  1. noise_complaints = complaints[complaints['Complaint Type'] == "Noise - Street/Sidewalk"]
  2. noise_complaints[:3]
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
02658965110/31/2013 02:08:41 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1143290-03 169 STREET169 STREET90 AVENUE91 AVENUENaNNaNADDRESSJAMAICANaNPrecinctAssigned10/31/2013 10:08:41 AM10/31/2013 02:35:17 AM12 QUEENSQUEENS1042027197389UnspecifiedQUEENSUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.708275-73.791604(40.70827532593202, -73.79160395779721)
162659408610/31/2013 12:54:03 AM10/31/2013 02:16:39 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk10310173 CAMPBELL AVENUECAMPBELL AVENUEHENDERSON AVENUEWINEGAR LANENaNNaNADDRESSSTATEN ISLANDNaNPrecinctClosed10/31/2013 08:54:03 AM10/31/2013 02:07:14 AM01 STATEN ISLANDSTATEN ISLAND952013171076UnspecifiedSTATEN ISLANDUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.636182-74.116150(40.63618202176914, -74.1161500428337)
252659157310/31/2013 12:35:18 AM10/31/2013 02:41:35 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1031224 PRINCETON LANEPRINCETON LANEHAMPTON GREENDEAD ENDNaNNaNADDRESSSTATEN ISLANDNaNPrecinctClosed10/31/2013 08:35:18 AM10/31/2013 01:45:17 AM03 STATEN ISLANDSTATEN ISLAND929577140964UnspecifiedSTATEN ISLANDUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.553421-74.196743(40.55342078716953, -74.19674315017886)

如果你查看noise_complaints,你会看到它生效了,它只包含带有正确的投诉类型的投诉。 但是这是如何工作的? 让我们把它解构成两部分

  1. complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  1. 0 True
  2. 1 False
  3. 2 False
  4. 3 False
  5. 4 False
  6. 5 False
  7. 6 False
  8. 7 False
  9. 8 False
  10. 9 False
  11. 10 False
  12. 11 False
  13. 12 False
  14. 13 False
  15. 14 False
  16. ...
  17. 111054 True
  18. 111055 False
  19. 111056 False
  20. 111057 False
  21. 111058 False
  22. 111059 True
  23. 111060 False
  24. 111061 False
  25. 111062 False
  26. 111063 False
  27. 111064 False
  28. 111065 False
  29. 111066 True
  30. 111067 False
  31. 111068 False
  32. Name: Complaint Type, Length: 111069, dtype: bool

这是一个TrueFalse的大数组,对应DataFrame中的每一行。 当我们用这个数组索引我们的DataFrame时,我们只得到其中为True行。

您还可以将多个条件与&运算符组合,如下所示:

  1. is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  2. in_brooklyn = complaints['Borough'] == "BROOKLYN"
  3. complaints[is_noise & in_brooklyn][:5]
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
312659556410/31/2013 12:30:36 AMNaNNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11236AVENUE JAVENUE JEAST 80 STREETEAST 81 STREETNaNNaNBLOCKFACEBROOKLYNNaNPrecinctOpen10/31/2013 08:30:36 AMNaN18 BROOKLYNBROOKLYN1008937170310UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.634104-73.911055(40.634103775951736, -73.91105541883589)
492659555310/31/2013 12:05:10 AM10/31/2013 02:43:43 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk1122525 LEFFERTS AVENUELEFFERTS AVENUEWASHINGTON AVENUEBEDFORD AVENUENaNNaNADDRESSBROOKLYNNaNPrecinctClosed10/31/2013 08:05:10 AM10/31/2013 01:29:29 AM09 BROOKLYNBROOKLYN995366180388UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.661793-73.959934(40.6617931276793, -73.95993363978067)
1092659465310/30/2013 11:26:32 PM10/31/2013 12:18:54 AMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11222NaNNaNNaNNaNDOBBIN STREETNORMAN STREETINTERSECTIONBROOKLYNNaNPrecinctClosed10/31/2013 07:26:32 AM10/31/2013 12:18:54 AM01 BROOKLYNBROOKLYN996925203271UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.724600-73.954271(40.724599563793525, -73.95427134534344)
2362659199210/30/2013 10:02:58 PM10/30/2013 10:23:20 PMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud TalkingStreet/Sidewalk11218DITMAS AVENUEDITMAS AVENUENaNNaNNaNNaNLATLONGBROOKLYNNaNPrecinctClosed10/31/2013 06:02:58 AM10/30/2013 10:23:20 PM01 BROOKLYNBROOKLYN991895171051UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.636169-73.972455(40.63616876563881, -73.97245504682485)
3702659416710/30/2013 08:38:25 PM10/30/2013 10:26:28 PMNYPDNew York City Police DepartmentNoise - Street/SidewalkLoud Music/PartyStreet/Sidewalk11218126 BEVERLY ROADBEVERLY ROADCHURCH AVENUEEAST 2 STREETNaNNaNADDRESSBROOKLYNNaNPrecinctClosed10/31/2013 04:38:25 AM10/30/2013 10:26:28 PM12 BROOKLYNBROOKLYN990144173511UnspecifiedBROOKLYNUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedUnspecifiedNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN40.642922-73.978762(40.6429222774404, -73.97876175474585)

或者如果我们只需要几列:

  1. complaints[is_noise & in_brooklyn][['Complaint Type', 'Borough', 'Created Date', 'Descriptor']][:10]
Complaint TypeBoroughCreated DateDescriptor
31Noise - Street/SidewalkBROOKLYN10/31/2013 12:30:36 AMLoud Music/Party
49Noise - Street/SidewalkBROOKLYN10/31/2013 12:05:10 AMLoud Talking
109Noise - Street/SidewalkBROOKLYN10/30/2013 11:26:32 PMLoud Music/Party
236Noise - Street/SidewalkBROOKLYN10/30/2013 10:02:58 PMLoud Talking
370Noise - Street/SidewalkBROOKLYN10/30/2013 08:38:25 PMLoud Music/Party
378Noise - Street/SidewalkBROOKLYN10/30/2013 08:32:13 PMLoud Talking
656Noise - Street/SidewalkBROOKLYN10/30/2013 06:07:39 PMLoud Music/Party
1251Noise - Street/SidewalkBROOKLYN10/30/2013 03:04:51 PMLoud Talking
5416Noise - Street/SidewalkBROOKLYN10/29/2013 10:07:02 PMLoud Talking
5584Noise - Street/SidewalkBROOKLYN10/29/2013 08:15:59 PMLoud Music/Party

3.2 numpy 数组的注解

在内部,列的类型是pd.Series

  1. pd.Series([1,2,3])
  1. 0 1
  2. 1 2
  3. 2 3
  4. dtype: int64

而且pandas.Series的内部是 numpy 数组。 如果将.values添加到任何Series的末尾,你将得到它的内部 numpy 数组。

  1. np.array([1,2,3])
  1. array([1, 2, 3])
  1. pd.Series([1,2,3]).values
  1. array([1, 2, 3])

所以这个二进制数组选择的操作,实际上适用于任何 NumPy 数组:

  1. arr = np.array([1,2,3])
  1. arr != 2
  1. array([ True, False, True], dtype=bool)
  1. arr[arr != 2]
  1. array([1, 3])

3.3 所以,哪个区的噪音投诉最多?

  1. is_noise = complaints['Complaint Type'] == "Noise - Street/Sidewalk"
  2. noise_complaints = complaints[is_noise]
  3. noise_complaints['Borough'].value_counts()
  1. MANHATTAN 917
  2. BROOKLYN 456
  3. BRONX 292
  4. QUEENS 226
  5. STATEN ISLAND 36
  6. Unspecified 1
  7. dtype: int64

这是曼哈顿! 但是,如果我们想要除以总投诉数量,以使它有点更有意义? 这也很容易:

  1. noise_complaint_counts = noise_complaints['Borough'].value_counts()
  2. complaint_counts = complaints['Borough'].value_counts()
  1. noise_complaint_counts / complaint_counts
  1. BRONX 0
  2. BROOKLYN 0
  3. MANHATTAN 0
  4. QUEENS 0
  5. STATEN ISLAND 0
  6. Unspecified 0
  7. dtype: int64

糟糕,为什么是零?这是因为 Python 2 中的整数除法。让我们通过将complaints_counts转换为浮点数组来解决它。

  1. noise_complaint_counts / complaint_counts.astype(float)
  1. BRONX 0.014833
  2. BROOKLYN 0.013864
  3. MANHATTAN 0.037755
  4. QUEENS 0.010143
  5. STATEN ISLAND 0.007474
  6. Unspecified 0.000141
  7. dtype: float64
  1. (noise_complaint_counts / complaint_counts.astype(float)).plot(kind='bar')
  1. <matplotlib.axes.AxesSubplot at 0x75b7890>

第三章 - 图1

所以曼哈顿的噪音投诉比其他区要多。