← Back to team overview

larry-discuss team mailing list archive

Re: groupmean reduce

 

On Tue, May 4, 2010 at 2:43 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> On Mon, May 3, 2010 at 8:30 PM,  <josef.pktd@xxxxxxxxx> wrote:
>> On Mon, May 3, 2010 at 8:01 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
>>> On Mon, May 3, 2010 at 1:13 PM,  <josef.pktd@xxxxxxxxx> wrote:
>>>> Here is a simple implementation of a reduce option in groupmean,
>>>> essentially it is two functions in one.
>>>>
>>>> see https://blueprints.launchpad.net/larry/+spec/group-method-design
>>>> as a standalone function it could also be plugged into other larry
>>>> methods, e.g. larry.mean
>>>>
>>>> Only tested on the example in the file.
>>>>
>>>> Josef
>>>
>>> A reduce option would be very handy. And it's very handy to have your
>>> implementation to get a feel for how it would work. Thank you.
>>>
>>> BTW, what do you think of a weight input to the group-like functions?
>>> It could be used, for example, to calculated a weighted group mean.
>>> The weight could be 1d or have the same number of dimensions as the
>>> input array.
>>
>> just to clarify
>> How would you interpret and use the weights?
>> So, for example, weights are firm sizes, then you want size weighted
>> averages for each sector.
>>
>> It would also need a weights option in nanmean.
>>
>> group_mean and nanmean would be useful with weights, but I don't know
>> what a weighted group_ranking would mean. group_median: would it be
>> the 50th percentile (in terms of weights or like a distribution)?
>
> Good point.
>
> I'm not sure what to do with the group methods. At the moment
> group_mean does not reduce which I think would be surprising to most
> people. So I guess one way to go would be to make a break in la 0.3
> and set reduce to True by default. That would work for reduce type
> functions like mean, sum, max. But non-reducing functions like zscore,
> ranking, demean do not fit the pattern. So that puts me back to
> setting reduce to False by default.
>
> reduce=True and reduce=False are two very different ideas. reduce=True
> returns a larry with group labels; reduce=False returns a larry with
> whatever labels it originally had.

That's why initially thought of having two different functions, or three:
mean, demean
group_mean, group_demean, group_meanfilter

group_meanfilter would be what is now called group_mean
I think meanfilter would be in analogy to the terminology in signal processing.

To safe on duplicate calculations, I have some written versions of
group_stats that calculate several things at the same time  (although
only non-nan versions).

Josef




>
>> There is also an attachment to a scipy.stats trac ticket that does
>> describtive statistics with weights and nan-handling.
>>
>> Josef
>>
>> Josef
>>
>



References