API Reference

Groupyr contains estimator classes that are fully compliant with the scikit-learn ecosystem. Consequently, their initialization, fit, predict, transform, and score methods will be familiar to sklearn users.

Sparse Groups Lasso Estimators

These are groupyr’s canonical estimators. SGL is intended for regression problems while LogisticSGL is intended for classification problems.

class groupyr.SGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]

An sklearn compatible sparse group lasso regressor.

This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.

Parameters
l1_ratiofloat, default=1.0

Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.

alphafloat, default=1.0

Hyper-parameter : overall regularization strength.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=1000

Maximum number of iterations for PGD solver.

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_.

verboseint, default=0

Verbosity flag for PGD solver. Any positive integer will produce verbose output

suppress_solver_warningsbool, default=True

If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.

References

1

Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250

Attributes
coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

class groupyr.LogisticSGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]

An sklearn compatible sparse group lasso classifier.

This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.

Parameters
l1_ratiofloat, default=1.0

Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.

alphafloat, default=0.0

Hyper-parameter : overall regularization strength.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=1000

Maximum number of iterations for PGD solver.

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_.

verboseint, default=0

Verbosity flag for PGD solver. Any positive integer will produce verbose output

suppress_solver_warningsbool, default=True

If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.

References

1

Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250

Attributes
classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

Cross-validation Estimators

These estimators have built-in cross-validation capabilities to find the best values of the hyperparameters alpha and l1_ratio. These are more efficient than using the canonical estimators with grid search because they make use of warm-starting. Alternatively, you can specify tuning_strategy = "bayes" to use Bayesian optimization over the hyperparameters instead of a grid search.

class groupyr.SGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, copy_X=True, cv=None, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]

Iterative SGL model fitting along a regularization path.

See the scikit-learn glossary entry for cross-validation estimator

Parameters
l1_ratiofloat or list of float, default=1.0

float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For l1_ratio = 0 the penalty is the group lasso penalty. For l1_ratio = 1 it is the lasso penalty. For 0 < l1_ratio < 1, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • an sklearn CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, sklearn.model_selection.KFold is used.

Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=0

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

tuning_strategy[“grid”, “bayes”], default=”grid”

Hyperparameter tuning strategy to use. If tuning_strategy == "grid", then evaluate all parameter points on the l1_ratio and alphas grid, using warm start to evaluate different alpha values along the regularization path. If tuning_strategy == "bayes", then a fixed number of parameter settings is sampled using skopt.BayesSearchCV. The fixed number of settings is set by n_bayes_iter. The l1_ratio setting is sampled uniformly from the minimum and maximum of the input l1_ratio parameter. The alpha setting is sampled log-uniformly either from the maximum and minumum of the input alphas parameter, if provided or from eps * max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.

n_bayes_iterint, default=50

Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization. n_bayes_iter trades off runtime vs quality of the solution. Consider increasing n_bayes_points if you want to try more parameter settings in parallel.

n_bayes_pointsint, default=1

Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with n_bayes_iter, the last iteration will sample fewer points.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

suppress_solver_warningsbool, default=True

If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.

See also

sgl_path
SGL
Attributes
alpha_float

The amount of penalization chosen by cross validation

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation

coef_ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula),

intercept_float or ndarray of shape (n_targets, n_features)

Independent term in the decision function.

scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Mean square error for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

n_iter_int

number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.

bayes_optimizer_skopt.BayesSearchCV instance or None

The BayesSearchCV instance used for hyperparameter optimization if tuning_strategy == "bayes". If tuning_strategy == "grid", then this attribute is None.

class groupyr.LogisticSGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, scoring=None, cv=None, copy_X=True, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]

Iterative Logistic SGL model fitting along a regularization path.

See the scikit-learn glossary entry for cross-validation estimator

Parameters
l1_ratiofloat or list of float, default=1.0

float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For l1_ratio = 0 the penalty is the group lasso penalty. For l1_ratio = 1 it is the lasso penalty. For 0 < l1_ratio < 1, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

scoringcallable, default=None

A string (see sklearn model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option used is accuracy_score.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • an sklearn CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, sklearn.model_selection.StratifiedKFold is used.

Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=False

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

tuning_strategy[“grid”, “bayes”], default=”grid”

Hyperparameter tuning strategy to use. If tuning_strategy == "grid", then evaluate all parameter points on the l1_ratio and alphas grid, using warm start to evaluate different alpha values along the regularization path. If tuning_strategy == "bayes", then a fixed number of parameter settings is sampled using skopt.BayesSearchCV. The fixed number of settings is set by n_bayes_iter. The l1_ratio setting is sampled uniformly from the minimum and maximum of the input l1_ratio parameter. The alpha setting is sampled log-uniformly either from the maximum and minumum of the input alphas parameter, if provided or from eps * max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.

n_bayes_iterint, default=50

Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization. n_bayes_iter trades off runtime vs quality of the solution. Consider increasing n_bayes_points if you want to try more parameter settings in parallel.

n_bayes_pointsint, default=1

Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with n_bayes_iter, the last iteration will sample fewer points.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

suppress_solver_warningsbool, default=True

If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.

See also

logistic_sgl_path
LogisticSGL
Attributes
alpha_float

The amount of penalization chosen by cross validation

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation

classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Classification score for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

n_iter_int

number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.

bayes_optimizer_skopt.BayesSearchCV instance or None

The BayesSearchCV instance used for hyperparameter optimization if tuning_strategy == "bayes". If tuning_strategy == "grid", then this attribute is None.

Dataset Generation

Use these functions to generate synthetic sparse grouped data.

groupyr.datasets.make_group_classification(n_samples=100, n_groups=20, n_informative_groups=2, n_features_per_group=20, n_informative_per_group=2, n_redundant_per_group=2, n_repeated_per_group=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, useful_indices=False, random_state=None)[source]

Generate a random n-class sparse group classification problem.

This initially creates clusters of points normally distributed (std=1) about vertices of an n_informative-dimensional hypercube with sides of length 2*class_sep and assigns an equal number of clusters to each class. It introduces interdependence between these features and adds various types of further noise to the data.

Prior to shuffling, X stacks a number of these primary “informative” features, “redundant” linear combinations of these, “repeated” duplicates of sampled features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_classification to construct a giant unshuffled classification problem of size n_groups * n_features_per_group and then distributes the returned features to each group. It then optionally shuffles each group.

Parameters
n_samplesint, optional (default=100)

The number of samples.

n_groupsint, optional (default=10)

The number of feature groups.

n_informative_groupsint, optional (default=2)

The total number of informative groups. All other groups will be just noise.

n_features_per_groupint, optional (default=20)

The total number of features_per_group. These comprise n_informative informative features, n_redundant redundant features, n_repeated duplicated features and n_features-n_informative-n_redundant- n_repeated useless features drawn at random.

n_informative_per_groupint, optional (default=2)

The number of informative features_per_group. Each class is composed of a number of gaussian clusters each located around the vertices of a hypercube in a subspace of dimension n_informative_per_group. For each cluster, informative features are drawn independently from N(0, 1) and then randomly linearly combined within each cluster in order to add covariance. The clusters are then placed on the vertices of the hypercube.

n_redundant_per_groupint, optional (default=2)

The number of redundant features per group. These features are generated as random linear combinations of the informative features.

n_repeated_per_groupint, optional (default=0)

The number of duplicated features per group, drawn randomly from the informative and the redundant features.

n_classesint, optional (default=2)

The number of classes (or labels) of the classification problem.

n_clusters_per_classint, optional (default=2)

The number of clusters per class.

weightslist of floats or None (default=None)

The proportions of samples assigned to each class. If None, then classes are balanced. Note that if len(weights) == n_classes - 1, then the last class weight is automatically inferred. More than n_samples samples may be returned if the sum of weights exceeds 1.

flip_yfloat, optional (default=0.01)

The fraction of samples whose class are randomly exchanged. Larger values introduce noise in the labels and make the classification task harder.

class_sepfloat, optional (default=1.0)

The factor multiplying the hypercube size. Larger values spread out the clusters/classes and make the classification task easier.

hypercubeboolean, optional (default=True)

If True, the clusters are put on the vertices of a hypercube. If False, the clusters are put on the vertices of a random polytope.

shiftfloat, array of shape [n_features] or None, optional (default=0.0)

Shift features by the specified value. If None, then features are shifted by a random value drawn in [-class_sep, class_sep].

scalefloat, array of shape [n_features] or None, optional (default=1.0)

Multiply features by the specified value. If None, then features are scaled by a random value drawn in [1, 100]. Note that scaling happens after shifting.

shuffleboolean, optional (default=True)

Shuffle the samples and the features.

useful_indicesboolean, optional (default=False)

If True, a boolean array indicating useful features is returned

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns
Xarray of shape [n_samples, n_features]

The generated samples.

yarray of shape [n_samples]

The integer labels for class membership of each sample.

groupslist of arrays

Each element is an array of feature indices that belong to that group

indicesarray of shape [n_features]

A boolean array indicating which features are useful. Returned only if useful_indices is True.

See also

sklearn.datasets.make_classification

non-group-sparse version

sklearn.datasets.make_blobs

simplified variant

sklearn.datasets.make_multilabel_classification

unrelated generator for multilabel tasks

Notes

The algorithm is adapted from Guyon [1] and was designed to generate the “Madelon” dataset.

References

1

I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003.

groupyr.datasets.make_group_regression(n_samples=100, n_groups=20, n_informative_groups=5, n_features_per_group=20, n_informative_per_group=5, effective_rank=None, noise=0.0, shift=0.0, scale=1.0, shuffle=False, coef=False, random_state=None)[source]

Generate a sparse group regression problem.

Prior to shuffling, X stacks a number of these primary “informative” features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_regression to construct a giant unshuffled regression problem of size n_groups * n_features_per_group and then distributes the returned features to each group. It then optionally shuffles each group.

Parameters
n_samplesint, optional (default=100)

The number of samples.

n_groupsint, optional (default=10)

The number of feature groups.

n_informative_groupsint, optional (default=2)

The total number of informative groups. All other groups will be just noise.

n_features_per_groupint, optional (default=20)

The total number of features_per_group. These comprise n_informative informative features, and n_features-n_informative useless features drawn at random.

n_informative_per_groupint, optional (default=2)

The number of informative features_per_group that have a non-zero regression coefficient.

effective_rankint or None, optional (default=None)

If not None, provides the number of singular vectors to explain the input data.

noisefloat, optional (default=0.0)

The standard deviation of the gaussian noise applied to the output.

shuffleboolean, optional (default=False)

Shuffle the samples and the features.

coefboolean, optional (default=False)

If True, returns coefficient values used to generate samples via sklearn.datasets.make_regression.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns
Xarray of shape [n_samples, n_features]

The generated samples.

yarray of shape [n_samples]

The integer labels for class membership of each sample.

groupslist of arrays

Each element is an array of feature indices that belong to that group

coefarray of shape [n_features]

A numpy array containing true regression coefficient values. Returned only if coef is True.

See also

sklearn.datasets.make_regression

non-group-sparse version