larry-discuss team mailing list archive
-
larry-discuss team
-
Mailing list archive
-
Message #00165
Outer joins on int dtype in pandas and la
Neither pandas nor la have missing value markers for int dtype. So
what should an outer join on two int data objects return?
If neither data object contains labels that are not in the other data
object, then an int data object can be returned. But if one data
object has a label that is not in the other data object, what then?
One option is to raise a TypeError; another option is to cast to
float.
The same issue applies to bool dtype. But casting to float doesn't
sound like a good option there.
Casting int to float might be dangerous. Code that a user tested with
aligned data and int dtype might stop working (perhaps it contains ==)
when unaligned data is used. So when in doubt raise instead of guess?
At least until that proves too painful?
The issue gets more complicated (for the developer, maybe not the
user) when there is mixed dtype. For example, an outer join on an int
dtype with a float dtype is no problem if you are doing a binary
operation like add: the result will get cast to float anyway. But if
you are just aligning data, then there is a problem if the int data
object contains labels that are not in the float data object.
Adding a mask, for example by using numpy.ma, is not an option for me
in the short term.