Data.csv文件(样本数据)
Taluka Crop Village Area
T1 C1 V1 11
T1 C1 V2 15
T1 C1 V3 3
T1 C1 V4 1
T1 C1 V5 2
T1 C2 V1 12
T1 C2 V2 16
T1 C2 V3 4
T1 C2 V4 100
T1 C2 V5 52
T1 C3 V1 47
T1 C3 V2 15
T1 C3 V3 21
T1 C3 V4 5
T1 C3 V5 7
T1 C4 V1 20
T1 C4 V2 14
T1 C4 V3 18
T1 C4 V4 5
T1 C4 V5 24
T2 C1 V1 21
T2 C1 V2 20
T2 C1 V3 14
T2 C1 V4 7
T2 C1 V5 8
T2 C2 V1 18
T2 C2 V2 3
T2 C2 V3 12
T2 C2 V4 78
T2 C2 V5 56
T2 C3 V1 16
T2 C3 V2 11
T2 C3 V3 15
T2 C3 V2 45
T2 C3 V3 2
T2 C4 V1 3
T2 C4 V2 12
T2 C4 V3 12
T2 C4 V4 44
T2 C4 V5 10
我想知道,
哪些村庄有高风险,中等风险和低风险地区的特定作物的特定taluka。
我总共有500个塔鲁卡,500个以下的塔鲁卡有10到14种作物,每个塔鲁卡将有100到200个村庄。
所以,我想找出,对于Taluka-1(即e-Thane)和Crop-1(即稻田),哪些村庄处于高风险、中风险和低风险下。采用百分位数法。
我做了一些工作。但问题是我的代码不是动态的。我需要输入每种塔鲁卡-每种作物,有这么多的组合。所以我需要使用一些循环(例如for循环、if循环)动态地完成这项工作,但是我被困在这部分。
请看我的密码。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("/home/desktop/Data.csv")
df.head()
##part-1 Partition taluka's
T1= df[df['Taluka'] == 'T1']
T2= df[df['Taluka'] == 'T2']
##Part-2 Partition crop wise in each taluka's
T1_C1= T1[T1['Crop'] == 'C1']
T1_C2= T1[T1['Crop'] == 'C2']
T1_C3= T1[T1['Crop'] == 'C3']
T1_C4= T1[T1['Crop'] == 'C4']
T2_C1= T2[T2['Crop'] == 'C1']
T2_C2= T2[T2['Crop'] == 'C2']
T2_C3= T2[T2['Crop'] == 'C3']
T2_C4= T2[T2['Crop'] == 'C4']
##Descending order
T1_C1 = T1_C1.sort('Area', ascending=False)
T1_C2 = T1_C2.sort('Area', ascending=False)
T1_C3 = T1_C3.sort('Area', ascending=False)
T1_C4 = T1_C4.sort('Area', ascending=False)
T2_C1 = T2_C1.sort('Area', ascending=False)
T2_C2 = T2_C2.sort('Area', ascending=False)
T2_C3 = T2_C3.sort('Area', ascending=False)
T2_C4 = T2_C4.sort('Area', ascending=False)
#####Add levels for for each crops in each taluka's
T1_C1['Level'] = pd.qcut(T1_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C2['Level'] = pd.qcut(T1_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C3['Level'] = pd.qcut(T1_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C4['Level'] = pd.qcut(T1_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C1['Level'] = pd.qcut(T2_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C2['Level'] = pd.qcut(T2_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C3['Level'] = pd.qcut(T2_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C4['Level'] = pd.qcut(T2_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
print(T1_C1)
所以,这里我将得到作物C1,taluka T1,哪些村庄位于高风险地区,低风险地区...
如何将其放入循环中?在那里我有reduce代码。代码将用于500塔鲁卡?
我认为您需要groupby
和apply
以及自定义功能:
def f(x):
labels = ['Low Risk','Medium Risk','High Risk']
x['Level'] = pd.qcut(x['Area'].sort_values(ascending=False), 3, labels = labels)
return x
df1 = df.groupby(['Taluka','Crop']).apply(f)
print (df1)
Taluka Crop Village Area Level
0 T1 C1 V1 11 High Risk
1 T1 C1 V2 15 High Risk
2 T1 C1 V3 3 Medium Risk
3 T1 C1 V4 1 Low Risk
4 T1 C1 V5 2 Low Risk
5 T1 C2 V1 12 Low Risk
6 T1 C2 V2 16 Medium Risk
7 T1 C2 V3 4 Low Risk
8 T1 C2 V4 100 High Risk
9 T1 C2 V5 52 High Risk
10 T1 C3 V1 47 High Risk
11 T1 C3 V2 15 Medium Risk
12 T1 C3 V3 21 High Risk
13 T1 C3 V4 5 Low Risk
14 T1 C3 V5 7 Low Risk
15 T1 C4 V1 20 High Risk
16 T1 C4 V2 14 Low Risk
17 T1 C4 V3 18 Medium Risk
18 T1 C4 V4 5 Low Risk
19 T1 C4 V5 24 High Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
23 T2 C1 V4 7 Low Risk
24 T2 C1 V5 8 Low Risk
25 T2 C2 V1 18 Medium Risk
26 T2 C2 V2 3 Low Risk
27 T2 C2 V3 12 Low Risk
28 T2 C2 V4 78 High Risk
29 T2 C2 V5 56 High Risk
30 T2 C3 V1 16 High Risk
31 T2 C3 V2 11 Low Risk
32 T2 C3 V3 15 Medium Risk
33 T2 C3 V2 45 High Risk
34 T2 C3 V3 2 Low Risk
35 T2 C4 V1 3 Low Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
38 T2 C4 V4 44 High Risk
39 T2 C4 V5 10 Low Risk
编辑:可能添加sort_values
最后:
df1 = df1.sort_values(['Taluka','Crop', 'Area'], ascending=[True, True, False])
print (df1)
Taluka Crop Village Area Level
1 T1 C1 V2 15 High Risk
0 T1 C1 V1 11 High Risk
2 T1 C1 V3 3 Medium Risk
4 T1 C1 V5 2 Low Risk
3 T1 C1 V4 1 Low Risk
8 T1 C2 V4 100 High Risk
9 T1 C2 V5 52 High Risk
6 T1 C2 V2 16 Medium Risk
5 T1 C2 V1 12 Low Risk
7 T1 C2 V3 4 Low Risk
10 T1 C3 V1 47 High Risk
12 T1 C3 V3 21 High Risk
11 T1 C3 V2 15 Medium Risk
14 T1 C3 V5 7 Low Risk
13 T1 C3 V4 5 Low Risk
19 T1 C4 V5 24 High Risk
15 T1 C4 V1 20 High Risk
17 T1 C4 V3 18 Medium Risk
16 T1 C4 V2 14 Low Risk
18 T1 C4 V4 5 Low Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
24 T2 C1 V5 8 Low Risk
23 T2 C1 V4 7 Low Risk
28 T2 C2 V4 78 High Risk
29 T2 C2 V5 56 High Risk
25 T2 C2 V1 18 Medium Risk
27 T2 C2 V3 12 Low Risk
26 T2 C2 V2 3 Low Risk
33 T2 C3 V2 45 High Risk
30 T2 C3 V1 16 High Risk
32 T2 C3 V3 15 Medium Risk
31 T2 C3 V2 11 Low Risk
34 T2 C3 V3 2 Low Risk
38 T2 C4 V4 44 High Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
39 T2 C4 V5 10 Low Risk
35 T2 C4 V1 3 Low Risk
or(slwier)在每个循环中排序:
def f(x):
labels = ['Low Risk','Medium Risk','High Risk']
x = x.sort_values('Area', ascending=False)
x['Level'] = pd.qcut(x['Area'], 3, labels = labels)
return x
df1 = df.groupby(['Taluka','Crop']).apply(f).reset_index(drop=True)
print (df1)
Taluka Crop Village Area Level
0 T1 C1 V2 15 High Risk
1 T1 C1 V1 11 High Risk
2 T1 C1 V3 3 Medium Risk
3 T1 C1 V5 2 Low Risk
4 T1 C1 V4 1 Low Risk
5 T1 C2 V4 100 High Risk
6 T1 C2 V5 52 High Risk
7 T1 C2 V2 16 Medium Risk
8 T1 C2 V1 12 Low Risk
9 T1 C2 V3 4 Low Risk
10 T1 C3 V1 47 High Risk
11 T1 C3 V3 21 High Risk
12 T1 C3 V2 15 Medium Risk
13 T1 C3 V5 7 Low Risk
14 T1 C3 V4 5 Low Risk
15 T1 C4 V5 24 High Risk
16 T1 C4 V1 20 High Risk
17 T1 C4 V3 18 Medium Risk
18 T1 C4 V2 14 Low Risk
19 T1 C4 V4 5 Low Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
23 T2 C1 V5 8 Low Risk
24 T2 C1 V4 7 Low Risk
25 T2 C2 V4 78 High Risk
26 T2 C2 V5 56 High Risk
27 T2 C2 V1 18 Medium Risk
28 T2 C2 V3 12 Low Risk
29 T2 C2 V2 3 Low Risk
30 T2 C3 V2 45 High Risk
31 T2 C3 V1 16 High Risk
32 T2 C3 V3 15 Medium Risk
33 T2 C3 V2 11 Low Risk
34 T2 C3 V3 2 Low Risk
35 T2 C4 V4 44 High Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
38 T2 C4 V5 10 Low Risk
39 T2 C4 V1 3 Low Risk
问题内容: 这是一个非常简单的for循环: 我知道它主要如何工作,但是我不明白最后的工作方式:如果我是对的,它应该加1,但是当它打印出时,它先打印0,然后再打印1。 为什么为什么不从1开始就因为?为什么仍然只打印出原始值而不是原始值? 问题答案: 一个循环的工作方式如下: 初始化完成(在您的情况下;仅执行一次) 条件检查(此处),如果条件为假,则退出循环 大括号内的代码已执行(根据您的情况) 更新
问题内容: 今天,有人陪我一起滥用Java 中的关键字。我编写了一个简单的循环来验证数组中是否存在某些内容。假设是一个length数组,这是我的代码: 现在有人告诉我这不是一个很好的编程,因为我在循环内使用了该语句,这将导致垃圾回收发生故障。因此,更好的代码将是: 问题是我无法正确解释为什么第一个for循环不是一个好习惯。有人可以给我一个解释吗? 问题答案: 现在有人告诉我这不是一个很好的编程,因
我有一个嵌套的for循环,但是它会减慢一点处理速度,我如何才能使嵌套循环高效。我需要的是对于外循环的每个值,内循环继续其所有迭代。但是,我不认为它会像两个嵌套循环那样影响计算。我的第二个问题是,循环会影响速度还是会支持我的现象? 我的代码:
问题内容: 通过一些旧的公司代码,我遇到了一个如下所示的for循环: 我尝试了Google,但找不到任何答案。我是在编程课上睡着了还是这是一个不寻常的循环? 问题答案: 一个for在Java循环结构如下- for (initialization statement; condition check; update) loop body; 如你所见,这里有四个语句- 初始化语句:第一次进入循环时,该
问题内容: 我不知道这是否是一个愚蠢的问题,但是我需要在不使用递归的情况下动态更改for循环的数量。 例如,如果n = 3,则需要3个嵌套的for循环。 如果n = 5: 有没有什么方法可以做到这一点而无需递归?另一个问题:Java中多重调度的用途是什么?我正在尝试用一种方法编写代码,它应该在参数的不同情况下运行不同的事件。否,如果声明/三元经营者/案件。 注意:我只能使用一种方法(部分问题),并
问题内容: 在下面的示例代码中,是否真的需要counter = 0,还是有更好,更多的Python方法来访问循环计数器?我看到了一些与循环计数器有关的PEP,但它们要么被延迟要么被拒绝(PEP 212 和PEP 281)。 这是我的问题的简化示例。在我的实际应用程序中,这是通过图形完成的,整个菜单必须每帧重新绘制一次。但这以易于复制的简单文本方式进行了演示。 也许我还应该补充一点,我正在使用Pyt