Keeping Track of Bootstraping in Python

I am trying to implement a bootstrapping based Lasso algorithm in Python. I first bootstrap B number of samples from my data and the feature set and then apply Lasso on it, in the end I want to average out the coefficients of Lasso and I want to keep track of the number of times each feature is selected with bootstrap, so that I dont take general mean but specific one, based on the number of times the feature was selected. However, I cannot wrap my head around how to do this in python:

    X = pd.DataFrame(X)           # convert X to dataframe for easier bootstraping
    y = pd.DataFrame(y)           # -----------------------------------------
    n, p = X.shape                # get parameters for beta matrix
    beta = pd.DataFrame(np.empty((self.bootstraps, p)))     # empty beta matrix with B rows and P columns

    for i in tqdm(range(self.bootstraps)):          # for loop for first bootstrapping
        features = np.random.choice(range(0, p), self.q1,
                                    replace=False)  # generate random indices for X
        samples = np.random.choice(range(0, n), n,
                                   replace=True)  # generate random indices for Y

        X1 = X.iloc[samples, features]  # boostrapped X
        Y1 = y.iloc[samples]  # boostrapped y

        # X1, Y1 = preprocessing.StandardScaler().fit(X1, Y1)

        lasso_cv = LassoCV(n_jobs=-1, **self.options)
        lasso_cv.fit(X1, Y1)
        beta.iloc[i, features] = lasso_cv.coef_  # save all coefficients for each bootstrap iterations

    beta = np.array(beta)
    probs = np.nanmean(np.abs(beta), axis=0)  # slight deviation from Random Lasso, abs taken inside 

Inside the for loop I want to create a array K which will keep track of number of times each feature is selected, which I would use in the end to take the average in the last line instead of np.nanmean.

Thank you!



Read more here: https://stackoverflow.com/questions/66330805/keeping-track-of-bootstraping-in-python

Content Attribution

This content was originally published by user13201583 at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: