← Back to team overview

larry-discuss team mailing list archive

Re: Bootstrap and cross validation iterators

 

On Sat, May 22, 2010 at 8:21 PM, Keith Goodman <kwgoodman@xxxxxxxxx> wrote:
> I added a resample module to the la package (la.util.resample). It
> currently contains bootstrap and k-fold cross validation iterators.
> The index iterators are not specific to larrys; they return lists of
> indices. Just thought I'd mention it since they can be used most
> anywhere. Lots of unit tests are in la.util.tests.resample_test.py.
>
> You can optionally set the state of your random number generator
> outside of the index iterators and pass in shuffle to cv and randint
> to boot.
>
> K-fold cross validation indices for 5 elements and 3 folds:
>
>    >>> from la.util.resample import cv
>    >>> for train, test in cv(5,2):
>    ...     print
>    ...     print 'train: ', train
>    ...     print 'test:  ', test
>    ...
>
>    train:  [4, 3, 1]
>    test:   [0, 2]
>
>    train:  [0, 2]
>    test:   [4, 3, 1]
>
> Three bootstrap samples taken with replacement from four elements:
>
>    >>> from la.util.resample import boot
>    >>> for train, test in boot(4, 3):
>    ...     print
>    ...     print 'train: ', train
>    ...     print 'test:  ', test
>    ...
>
>    train:  [2 1 3 1]
>    test:   [0]
>
>    train:  [1 1 2 1]
>    test:   [0, 3]
>
>    train:  [1 3 0 0]
>    test:   [2]
>
> http://bazaar.launchpad.net/~kwgoodman/larry/trunk/annotate/head:/la/util/resample.py
> http://bazaar.launchpad.net/~kwgoodman/larry/trunk/annotate/head:/la/util/tests/resample_test.py

2 design question

why did you choose to use a fixed seed by default? (I'm not completely
sure how using RandomState directly works, I usually just use
random.seed)

In some early leave one out loops, I also used indices to select. The
scikits.learn cross_val iterators use boolean index arrays. Do you
have any idea whether integer or boolean indices are faster?

Does boot work if nboot=n  (no testsample) ?
I find the function names, especially cv (crossval_random_kfold?), a
bit too short and unspecific.

I think we will have more design questions, when we start to use this
(or similar) more systematically than just some eclectic examples of
bootstrap as we have until now.

Josef



> _______________________________________________
> Mailing list: https://launchpad.net/~larry-discuss
> Post to     : larry-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~larry-discuss
> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References