问题：

Pandas中布尔索引的逻辑运算符

戚修雅

2023-03-14

我正在使用熊猫中的布尔索引。

问题是为什么声明：

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

很好，但是

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

错误退出？

例子：

a = pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

共有3个答案

边永贞

2023-03-14

 
  
  匿名用户 
  

  
 
   
  Pandas中布尔索引的逻辑运算符
  重要的是要意识到，您不能在熊猫上使用任何Python逻辑运算符（和，或或not）。系列或熊猫。DataFrames（同样，您不能在具有多个元素的numpy.arrays上使用它们）。你不能使用它们的原因是因为它们在操作数上隐式调用bool，这会抛出一个异常，因为这些数据结构决定数组的布尔值是模糊的：
  >>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

  我在回答“系列的真理价值是模糊的”时，确实更广泛地涵盖了这一点。使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）”Q A.
  但是，NumPy为这些运算符提供了元素操作等价物，作为可用于NumPy.array、pandas.Series、pandas.DataFrame、或任何其他（一致的）NumPy.array子类的函数：
  
   和具有np.logical\u和
  
  所以，基本上，我们应该使用（假设df1和df2是Pandas DataFrames）：
  np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)

  但是，如果您有布尔NumPy数组、Pandas Series或Pandas DataFrames，您也可以使用按元素按位的函数（对于布尔函数，它们与逻辑函数是——或者至少应该是——无法区分的）：
  
   按位和：np.bitwise_and或
  
  通常使用运算符。但是，当与比较运算符组合时，必须记住将比较括在括号中，因为按位运算符的优先级高于比较运算符：
  (df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10

  这可能会让人恼火，因为Python逻辑运算符的优先级低于比较运算符，所以您通常会编写a
  必须强调的是，位和逻辑运算仅对布尔NumPy数组（和布尔级数）等效
  >>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

  由于NumPy（和类似的熊猫）对布尔（布尔或掩码索引数组）和整数（索引数组）索引做不同的事情，索引的结果也会不同：
  >>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

  Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
       and       |  np.logical_and        | np.bitwise_and         |        &
-------------------------------------------------------------------------------------
       or        |  np.logical_or         | np.bitwise_or          |        |
-------------------------------------------------------------------------------------
                 |  np.logical_xor        | np.bitwise_xor         |        ^
-------------------------------------------------------------------------------------
       not       |  np.logical_not        | np.invert              |        ~

  其中逻辑运算符不适用于NumPy阵列、Pandas系列和Pandas数据帧。其他人处理这些数据结构（以及普通Python对象）和工作元素。但是，在普通Python上要小心按位反转，因为bool在这个上下文中将被解释为整数（例如~False返回-1和~True返回-2）。

上官树

2023-03-14

Python的和、或和非逻辑运算符设计用于处理标量。因此，Pandas必须做得更好，并重写按位运算符以实现此功能的矢量化（按元素）版本。

所以下面在python中（exp1和exp2是计算为布尔结果的表达式）...

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

...会翻译成...

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

为了熊猫。

如果在执行逻辑操作的过程中，您得到了一个ValueError，那么您需要使用括号进行分组：

(exp1) op (exp2)

例如，

(df['col1'] == x) & (df['col2'] == y)

等等

布尔索引：常见的操作是通过逻辑条件计算布尔掩码以过滤数据。Pandas提供三个运算符：

考虑以下设置：

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

对于上面的df，假设您希望返回

重载位

另一种常见的操作是使用布尔向量来过滤数据。运算符为：|for或，

因此，考虑到这一点，可以使用位运算符

df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

(df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

接下来的过滤步骤很简单，

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

括号用于覆盖按位运算符的默认优先顺序，这些运算符的优先级高于条件运算符

如果不使用括号，则表达式的计算结果不正确。例如，如果您不小心尝试了以下操作：

df['A'] < 5 & df['B'] > 5

它被解析为

df['A'] < (5 & df['B']) > 5

它变成了，

df['A'] < something_you_dont_want > 5

它变成了（参见python文档中关于链式运算符比较的内容），

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

它变成了，

# Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

哪个扔

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所以，不要犯那个错误¹

避免括号分组修复方法实际上非常简单。对于数据帧，大多数操作符都有相应的绑定方法。如果单个掩码是使用函数而不是条件运算符构建的，则不再需要按参数分组以指定求值顺序：

df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

请参见“灵活比较”一节。。总而言之，我们有

╒════╤════════════╤════════════╕
│    │ Operator   │ Function   │
╞════╪════════════╪════════════╡
│  0 │ >          │ gt         │
├────┼────────────┼────────────┤
│  1 │ >=         │ ge         │
├────┼────────────┼────────────┤
│  2 │ <          │ lt         │
├────┼────────────┼────────────┤
│  3 │ <=         │ le         │
├────┼────────────┼────────────┤
│  4 │ ==         │ eq         │
├────┼────────────┼────────────┤
│  5 │ !=         │ ne         │
╘════╧════════════╧════════════╛

另一个避免括号的选项是使用DataFrame.query（或ava）：

df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

我使用pd.eval（）在pandas中的动态表达式计算中广泛地记录了query和eval。

operator.and_< /code> 允许您以功能方式执行此操作。内部调用对应于位运算符的Series.__and__。

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5) 

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

你通常不需要这个，但知道这个很有用。

泛化：np.logical\u和（和logical\u和.reduce）另一种选择是使用np.logical\u和，也不需要括号分组：

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and是一个ufunc（通用函数），大多数ufunc都有一个减少方法。这意味着如果AND有多个掩码，则更容易使用logical_and进行泛化。例如，以AND掩码m1和m2和m3与

m1 & m2 & m3

然而，更简单的选择是

np.logical_and.reduce([m1, m2, m3])

这是非常强大的，因为它允许您在此基础上构建更复杂的逻辑（例如，在列表理解中动态生成掩码并添加所有掩码）：

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

_{1-我知道我在反复强调这一点，但请容忍我。这是一个非常非常常见的初学者错误，必须非常彻底地解释}

对于上面的df，假设您希望返回A==3或B==7的所有行。

按位重载|

df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果你还没有，也请阅读上面关于逻辑和的部分，所有警告都适用于这里。

或者，此操作可以使用指定

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

接线员。或呼叫序列。或在引擎盖下。

operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

np.logical\u或对于两种情况，请使用logical\u或：

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

对于多个掩码，使用logical_or.reduce：

np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

给一个面具，例如

mask = pd.Series([True, True, False])

如果需要反转每个布尔值（以便最终结果为[False，False，True]），则可以使用以下任何方法。

按位~

~mask

0    False
1    False
2     True
dtype: bool

同样，表达式需要加括号。

~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

这在内部调用

mask.__invert__()

0    False
1    False
2     True
dtype: bool

但是不要直接使用它。

operator.inv 内部调用序列上的\uuuu invert\uuuu。

operator.inv(mask)

0    False
1    False
2     True
dtype: bool

np.logical\u not 这是numpy变体。

np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

注意，np.logical\u和可以替换为np.bitwise\u和，logical\u或使用bitwise\u或，而logical\u不使用反转。

石正信

2023-03-14

当你说

(a['x']==1) and (a['y']==10)

您隐式地要求Python将（a['x']==1）和（a['y']==10）转换为布尔值。

NumPy数组（长度大于1）和Pandas对象（如Series）没有布尔值——换句话说，它们提高了

ValueError：数组的真值不明确。使用a.empty、a.any（）或a.all（）。

当用作布尔值时。那是因为不清楚什么时候是真是假。如果长度不为零（如Python列表），一些用户可能会认为它们是真的。其他人可能只希望它的所有元素都是真的。其他人可能希望它是真的，如果它的任何元素是真的。

因为有太多相互冲突的期望，NumPy和Pandas的设计师拒绝猜测，反而提出了一个错误。

相反，您必须是显式的，通过调用empty（）、all（）或any（）方法来指示您想要的行为。

然而，在本例中，看起来您不需要布尔求值，而是需要元素逻辑and。这就是

(a['x']==1) & (a['y']==10)

返回一个布尔数组。

顺便说一下，正如alexpmil所指出的，括号是强制性的，因为

如果没有括号，a['x']==1

类似资料：

Python-Pandas中布尔索引的逻辑运算符

问题内容：我正在Pandas中使用布尔值索引。问题是为什么要声明：工作正常而错误退出？例：问题答案：当你说你暗中要求Python进行转换并转换为布尔值。 NumPy数组（长度大于1）和对象（例如）没有布尔值-换句话说，它们引发当用作布尔值时。那是因为不清楚何时应该为True或False。如果某些用户的长度非零，则可能会认为它们为True，例如Python列表。其他人可能只希望其所有
pandas 布尔索引

本文向大家介绍pandas 布尔索引，包括了pandas 布尔索引的使用技巧和注意事项，需要的朋友参考一下示例可以使用布尔数组选择数据框的行和列。有关熊猫文档的更多信息。
逻辑布尔否定运算符优先级和关联

这是我在StackOverflow上的第一个问题，因此我想以前有人在这里问过，但是我在搜索栏中键入的所有内容都给了我不同的问题。（或者有时根本没有结果！）我正在学习w3Schools，但我看到了这个看似简单的代码片段，它引发了一小部分问题：本质上，守则说：在这种特殊情况下，“test”在成功时返回字符串，在失败时返回（布尔）FALSE。以下是在我心中激起的问题。 > 至于条件/if语句开头
逻辑运算

and, or, not 其中，and 和 or 与 C 语言区别特别大。在这里，请先记住，在 Lua 中，只有 false 和 nil 才计算为 false，其它任何数据都计算为 true，0 也是 true！ and 和 or 的运算结果不是 true 和 false，而是和它的两个操作数相关。 a and b：如果 a 为 false，则返回 a；否则返回 b a or b：如果 a
逻辑运算

2.2.1关系表达式 a) 关系运算符关系运算符就是关系比较符。Fortran中有六个关系运算符: 关系运算符英语含义所代表的数学符号 .GT. .GE. .LT. .LE. .EQ. .NE. > >= < <= == /= Greater Than Greater than or Equal to Less Than Less than or Equal to EQual to Not
逻辑运算符

JavaScript 中有三个逻辑运算符：||（或），&&（与），!（非）。虽然它们被称为“逻辑”运算符，但这些运算符却可以被应用于任意类型的值，而不仅仅是布尔值。它们的结果也同样可以是任意类型。让我们来详细看一下。 ||（或）两个竖线符号表示“或”运算符： result = a || b; 在传统的编程中，逻辑或仅能够操作布尔值。如果参与运算的任意一个参数为 true，返回的结果就为 tr

Pandas中布尔索引的逻辑运算符

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档