Granular Monotonic Binning in SAS
This article is originally published at https://statcompute.wordpress.com
In the post (https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression), it is shown how to do a finer monotonic binning with isotonic regression in R.
Below is a SAS macro implementing the monotonic binning with the same idea of isotonic regression. This macro is more efficient than the one shown in (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) without iterative binning and is also able to significantly increase the binning granularity.
%macro monobin(data = , y = , x = ); options mprint mlogic; data _data_ (keep = _x _y); set &data; where &y in (0, 1) and &x ~= .; _y = &y; _x = &x; run; proc transreg data = _last_ noprint; model identity(_y) = monotone(_x); output out = _tmp1 tip = _t; run; proc summary data = _last_ nway; class _t_x; output out = _data_ (drop = _freq_ _type_) mean(_y) = _rate; run; proc sort data = _last_; by _rate; run; data _tmp2; set _last_; by _rate; _idx = _n_; if _rate = 0 then _idx = _idx + 1; if _rate = 1 then _idx = _idx - 1; run; proc sql noprint; create table _tmp3 as select a.*, b._idx from _tmp1 as a inner join _tmp2 as b on a._t_x = b._t_x; create table _tmp4 as select a._idx, min(a._x) as _min_x, max(a._x) as _max_x, sum(a._y) as _bads, count(a._y) as _freq, mean(a._y) as _rate, sum(a._y) / b.bads as _bpct, sum(1 - a._y) / (b.freq - b.bads) as _gpct, log(calculated _bpct / calculated _gpct) as _woe, (calculated _bpct - calculated _gpct) * calculated _woe as _iv from _tmp3 as a, (select count(*) as freq, sum(_y) as bads from _tmp3) as b group by a._idx; quit; title "Monotonic WoE Binning for %upcase(%trim(&x))"; proc print data = _last_ label noobs; var _min_x _max_x _bads _freq _rate _woe _iv; label _min_x = "Lower" _max_x = "Upper" _bads = "#Bads" _freq = "#Freq" _rate = "BadRate" _woe = "WoE" _iv = "IV"; sum _bads _freq _iv; run; title; %mend monobin;
Below is the sample output for LTV, showing an identical binning scheme to the one generated by the R isobin() function.
Please visit source website for post related comments.