R News

A Comparison of Several qeML Predictive Methods

by matloff · December 4, 2023

This article is originally published at https://matloff.wordpress.com

Is machine learning overrated, with traditional methods being underrated these days? Yes, ML has had some celebrated successes, but these have come after huge amounts of effort, and it’s possible that similar effort with traditional methods may have produced similar results.

A related issue concerns the type of data. Hard core MLers tend to divide applications into tabular and nontabular data. The former consists of the classical observations in rows, variables in columns format, while the latter means image processing, NLP and the like. The MLers’ prescription: use XGBoost for tabular data, deep learning (in some form) for the nontabular apps.

In this post, I’ll compare the performance of four predictive methods on a year 2000 census dataset svcensus, included with the qeML package. The comparison is just for illustration, not comprehensive by any means. For each method, I vary only one hyperparameter: k for qeKNN; minimum leaf size for qeRFranger; polynomial degree for qePolyLin; and max tree depth for qeXGBoost.

I used only the first 5000 rows of the dataset. There were 5 predictors, becoming 10 after categoricals were converted (internally) to dummies. The variable being predicted is wage income. For each hyperparameter value, 100 runs were done. (Recall: by default, qeML predictive functions form a random holdout set.)

15 hyperparameter values were tried for each method. In the case of qeKNN and qeRFranger, these were seq(5,75,5). For the other two methods, the values were 1:15.

Here are the results:

The winner here turned out to be good ol’ polynomial regression. Obviously overfitting occurs, but somewhat surprisingly, only after degree 6 or so. Random forests seems to have leveled off at 60 or so. All might do better by tweaking other default hyperparameters, especially KNN.

Of course, all the methods here could be considered traditional statistical methods, as all had significant statistician contribution. But only polynomial regression is truly traditional, and it’s interesting to see that it prevailed in this case.

From now on, I plan to make code for my blog posts available in my qeML repo, https://github.com/matloff/qeML in the inst/blogs subdirectory. So, readers can conveniently try their own experiments.

Thanks for visiting r-craft.org
This article is originally published at https://matloff.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

A Comparison of Several qeML Predictive Methods

You may also like...

Categories

A Comparison of Several qeML Predictive Methods

You may also like...

Applied Machine Learning Workshop

reticulate 1.14

Six easy ways to run your Jupyter Notebook in the cloud

Categories