pandas で NaN に対する論理和(or)と論理積(and)

Published: 2022/7/12


pandas において、数値系のデータに対する欠損値は、 NaN を用いて表される。 欠損値なので、論理和(or)や論理積(and)を良い感じに計算したいと思った時に、どうやればできるのか調べた際のメモ。

以下のような Series が与えられた時、それぞれの要素ごとの論理和と論理積の結果を取得したいとする。

import pandas as pd
import numpy as np

s1 = pd.Series([np.nan, 2, 3])
s2 = pd.Series([4, np.nan, 6])

# s1 の値:
# 0    NaN
# 1    2.0
# 2    3.0
# dtype: float64

# s2 の値:
# 0    4.0
# 1    NaN
# 2    6.0
# dtype: float64

ちなみに、こういった数値系の Series に対しては | などの論理演算子は、実際にやろうとするとエラーになる。

s1 | s2
# 結果:
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:301, in na_logical_op(x, y, op)
#     292 try:
#     293     # For exposition, write:
#     294     #  yarr = isinstance(y, np.ndarray)
#    (...)
#     299     # Then Cases where this goes through without raising include:
#     300     #  (xint or xbool) and (yint or bool)
# --> 301     result = op(x, y)
#     302 except TypeError:
#
# TypeError: unsupported operand type(s) for |: 'float' and 'float'
#
# During handling of the above exception, another exception occurred:
#
# TypeError                                 Traceback (most recent call last)
# Input In [25], in <cell line: 1>()
# ----> 1 s1 | s2
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/common.py:70, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
#      66             return NotImplemented
#      68 other = item_from_zerodim(other)
# ---> 70 return method(self, other)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/arraylike.py:78, in OpsMixin.__or__(self, other)
#      76 @unpack_zerodim_and_defer("__or__")
#      77 def __or__(self, other):
# ---> 78     return self._logical_method(other, operator.or_)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/series.py:5634, in Series._logical_method(self, other, op)
#    5631 lvalues = self._values
#    5632 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
# -> 5634 res_values = ops.logical_op(lvalues, rvalues, op)
#    5635 return self._construct_result(res_values, name=res_name)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:391, in logical_op(left, right, op)
#     387 # For int vs int `^`, `|`, `&` are bitwise operators and return
#     388 #   integer dtypes.  Otherwise these are boolean ops
#     389 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
# --> 391 res_values = na_logical_op(lvalues, rvalues, op)
#     392 # error: Cannot call function of unknown type
#     393 res_values = filler(res_values)  # type: ignore[operator]
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:308, in na_logical_op(x, y, op)
#     306     x = ensure_object(x)
#     307     y = ensure_object(y)
# --> 308     result = libops.vec_binop(x.ravel(), y.ravel(), op)
#     309 else:
#     310     # let null fall thru
#     311     assert lib.is_scalar(y)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/_libs/ops.pyx:252, in pandas._libs.ops.vec_binop()
#
# File /opt/conda/lib/python3.10/site-packages/pandas/_libs/ops.pyx:245, in pandas._libs.ops.vec_binop()
#
# TypeError: unsupported operand type(s) for |: 'float' and 'bool'

また、 andor は、 Series それ自体に対しての論理和ないし論理積的に動作しようとするため、これは適さない。

s1 or s2
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)
# Input In [26], in <cell line: 1>()
# ----> 1 s1 or s2
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/generic.py:1527, in NDFrame.__nonzero__(self)
#    1525 @final
#    1526 def __nonzero__(self):
# -> 1527     raise ValueError(
#    1528         f"The truth value of a {type(self).__name__} is ambiguous. "
#    1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
#    1530     )
#
# ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

論理積(and)について

NaN は数値演算に対して伝播すること、また、論理積は最後の値を返すのが正しいので、以下のようにすれば良い。

s1 * 0 + s2

# 0    NaN
# 1    NaN
# 2    6.0
# dtype: float64

論理積(or)について

combine_firstfillna が利用できる。

s1.combine_first(s2)

# 0    4.0
# 1    2.0
# 2    3.0
# dtype: float64

参考

https://pandas.pydata.org/docs/user_guide/missing_data.html


Tags: pandas