feature_selector包中identify_zero_importance函数对连续变量报错

夏宪
2023-12-01

包出处:GitHub - WillKoehrsen/feature-selector: Feature selector is a tool for dimensionality reduction of machine learning datasets

fs.identify_zero_importance(task = 'regression', eval_metric = 'l2', 
                            n_iterations = 10, early_stopping = True)

当运行上述代码时,程序返回

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

对于回归问题,由于因变量是连续变量,本文中为-3至+3的连续变量,不应该存在某个class只有一个值的情况,因此查看函数源码,其中一行为:

train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15, stratify=labels)

这里stratify=labels是固定的传入参数,显然不能和连续变量相适配,去除该参数后恢复正常。

此外,在运行代码时会出现

UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.

将代码299行修改为如下即可

if _early_stopping:
    train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15) #, stratify=labels

# Train the model with early stopping
    model.fit(train_features, train_labels, eval_metric = eval_metric,
                          eval_set = [(valid_features, valid_labels)],
                          callbacks = [lgb.log_evaluation(period=100), lgb.early_stopping(stopping_rounds=30)])

注意需要将参数early_stopping改名为_early_stopping,不然也会报错

 类似资料: