pandas で NaN に対する論理和(or)と論理積(and)
Published: 2022/7/12
pandas において、数値系のデータに対する欠損値は、 NaN を用いて表される。
欠損値なので、論理和(or
)や論理積(and
)を良い感じに計算したいと思った時に、どうやればできるのか調べた際のメモ。
以下のような Series が与えられた時、それぞれの要素ごとの論理和と論理積の結果を取得したいとする。
import pandas as pd
import numpy as np
s1 = pd.Series([np.nan, 2, 3])
s2 = pd.Series([4, np.nan, 6])
# s1 の値:
# 0 NaN
# 1 2.0
# 2 3.0
# dtype: float64
# s2 の値:
# 0 4.0
# 1 NaN
# 2 6.0
# dtype: float64
ちなみに、こういった数値系の Series に対しては |
などの論理演算子は、実際にやろうとするとエラーになる。
s1 | s2
# 結果:
# ---------------------------------------------------------------------------
# TypeError Traceback (most recent call last)
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:301, in na_logical_op(x, y, op)
# 292 try:
# 293 # For exposition, write:
# 294 # yarr = isinstance(y, np.ndarray)
# (...)
# 299 # Then Cases where this goes through without raising include:
# 300 # (xint or xbool) and (yint or bool)
# --> 301 result = op(x, y)
# 302 except TypeError:
#
# TypeError: unsupported operand type(s) for |: 'float' and 'float'
#
# During handling of the above exception, another exception occurred:
#
# TypeError Traceback (most recent call last)
# Input In [25], in <cell line: 1>()
# ----> 1 s1 | s2
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/common.py:70, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
# 66 return NotImplemented
# 68 other = item_from_zerodim(other)
# ---> 70 return method(self, other)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/arraylike.py:78, in OpsMixin.__or__(self, other)
# 76 @unpack_zerodim_and_defer("__or__")
# 77 def __or__(self, other):
# ---> 78 return self._logical_method(other, operator.or_)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/series.py:5634, in Series._logical_method(self, other, op)
# 5631 lvalues = self._values
# 5632 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
# -> 5634 res_values = ops.logical_op(lvalues, rvalues, op)
# 5635 return self._construct_result(res_values, name=res_name)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:391, in logical_op(left, right, op)
# 387 # For int vs int `^`, `|`, `&` are bitwise operators and return
# 388 # integer dtypes. Otherwise these are boolean ops
# 389 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
# --> 391 res_values = na_logical_op(lvalues, rvalues, op)
# 392 # error: Cannot call function of unknown type
# 393 res_values = filler(res_values) # type: ignore[operator]
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/ops/array_ops.py:308, in na_logical_op(x, y, op)
# 306 x = ensure_object(x)
# 307 y = ensure_object(y)
# --> 308 result = libops.vec_binop(x.ravel(), y.ravel(), op)
# 309 else:
# 310 # let null fall thru
# 311 assert lib.is_scalar(y)
#
# File /opt/conda/lib/python3.10/site-packages/pandas/_libs/ops.pyx:252, in pandas._libs.ops.vec_binop()
#
# File /opt/conda/lib/python3.10/site-packages/pandas/_libs/ops.pyx:245, in pandas._libs.ops.vec_binop()
#
# TypeError: unsupported operand type(s) for |: 'float' and 'bool'
また、 and
や or
は、 Series それ自体に対しての論理和ないし論理積的に動作しようとするため、これは適さない。
s1 or s2
# ---------------------------------------------------------------------------
# ValueError Traceback (most recent call last)
# Input In [26], in <cell line: 1>()
# ----> 1 s1 or s2
#
# File /opt/conda/lib/python3.10/site-packages/pandas/core/generic.py:1527, in NDFrame.__nonzero__(self)
# 1525 @final
# 1526 def __nonzero__(self):
# -> 1527 raise ValueError(
# 1528 f"The truth value of a {type(self).__name__} is ambiguous. "
# 1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
# 1530 )
#
# ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
論理積(and)について
NaN は数値演算に対して伝播すること、また、論理積は最後の値を返すのが正しいので、以下のようにすれば良い。
s1 * 0 + s2
# 0 NaN
# 1 NaN
# 2 6.0
# dtype: float64
論理積(or)について
combine_first
や fillna
が利用できる。
s1.combine_first(s2)
# 0 4.0
# 1 2.0
# 2 3.0
# dtype: float64
参考
How to remove nan value while combining two column in Panda Data frame?
I am trying but not able to remove nan while combining two columns of a DataFrame. Data is like: feedback_id _id 568a8c25cac4991645c287ac nan 568df45b177e30c6487d3603 ...
stackoverflow.com
