WOE 编码¶
WOE(Weight of Evidence)把分类变量或分箱后的连续变量映射为线性可分的数值,是评分卡建模的标准做法。
SuperModelingFactory 在 WOE 子包提供 主控类 + 单调分箱器 + 转换器 + 绘图器 + 分箱引擎适配器。
1. 主控类 —— WOE_Master¶
from Modeling_Tool import WOE_Master
woe = WOE_Master(
train_data=train_df,
varlist=features,
dep="bad_flag",
missing_ref_value=-999999,
)
woe.fit(nbins=10, equal_freq=True)
train_woe = woe.transform(train_df)
test_woe = woe.transform(test_df)
oot_woe = woe.transform(oot_df)
持久化映射表¶
woe.save_mapping_table("./output/woe_mapping.csv")
from Modeling_Tool import load_mapping_table
varlist, woe_dict = load_mapping_table("./output/woe_mapping.csv")
2. 贪心单调分箱器 —— MonotoneWOEBinner¶
如果评分卡需要更强的单调约束,推荐使用 MonotoneWOEBinner。
from Modeling_Tool.WOE.WOE_Monotone_Binner import MonotoneWOEBinner
binner = MonotoneWOEBinner(
feature_cols=features,
target_col="bad_flag",
n_init_bins=20,
min_bin_size=0.03,
special_values=[-1, -100, -999999],
cate_feats=["city_grade"],
)
binner.fit(train_df, chi2_binning=True, chi2_p=0.95)
binner.refine_cate(max_bins=5)
train_woe = binner.apply_woe(train_df)
bins = binner.get_final_bins()
edges = binner.get_bin_edges()
方法列表¶
| 方法 | 说明 |
|---|---|
fit(df, chi2_binning, chi2_p, n_jobs) |
训练拟合 |
refine_cate(max_bins) |
类别特征按坏率聚类合并 |
apply_woe(df) |
WOE 转换 |
get_final_bins() |
导出分箱结果(含 WOE/IV) |
load_woe_bins(bins_dict) |
加载已有分箱 |
get_bin_edges() |
取分箱边界列表 |
export_woe_report(path) |
导出 Excel 报告 |
plot_woe_graph(dir, group_name=) |
输出 WOE 图 PNG |
3. 统一分箱引擎 —— as_woe_engine¶
WOE_Master 与 MonotoneWOEBinner 的内部产物格式不同。as_woe_engine() 会把它们转成统一接口,供 PSI、IV、相关性筛选复用。
from Modeling_Tool import as_woe_engine
engine = as_woe_engine(binner) # 也可以传 WOE_Master
woe_table = engine.get_woe_table(features)
train_woe = engine.transform(train_df, features)
更多说明见 WOE 分箱引擎。
4. 与特征筛选联动¶
训练期拟合一次分箱器,后续筛选、监控、建模都复用同一对象:
from Modeling_Tool import PSICalculator, VarExtractionInsights, CorrelationFilter
psi = PSICalculator(binning_engine=binner).calculate(train_df, oot_df, features)
iv_report = VarExtractionInsights(
train_df, "bad_flag", "./iv_plots/",
woe_engine="monotone", woe_binner=binner,
).get_var_analysis_report(train_df, features)
keep_vars = CorrelationFilter(
train_df, "bad_flag", corr_cutpoint=0.7,
woe_engine="monotone", woe_binner=binner,
).remove_highly_correlated(features)
train_woe = binner.apply_woe(train_df)
5. 单调性检查¶
from Modeling_Tool import is_monotonic, get_overall_woe_table
for var in features:
woe_table = get_overall_woe_table(woe, train_df, [var])
mono, direction = is_monotonic(woe_table, "WOE", direction="auto")
print(var, mono, direction)
6. 单独 WOE 转换¶
from Modeling_Tool import woe_transform, woe_transformation
single_df, single_map = woe_transform(train_df, var="age", dep="bad_flag", nbins=10)
batch_result = woe_transformation(train_df, varlist=features, dep="bad_flag", nbins=10)
常见问题¶
什么时候选择 MonotoneWOEBinner?
当变量会进入评分卡、需要更强可解释性和单调约束时,优先使用 MonotoneWOEBinner。
为什么要在筛选阶段传入 binner?
因为 PSI / IV / KS 应该基于最终上线的同一套分箱计算,否则筛选指标和建模输入可能不一致。