data.table User Survey
The data.table 2023 user community survey is here, open until December 1st.continue reading.
The data.table 2023 user community survey is here, open until December 1st.continue reading.
In writing an R package, it is often useful to build up some function call in string form, then “execute” the string. To give a really simple example: Quite a...continue reading.
What about variable selection? Which predictor variables/features should we use? No matter what anyone tells you, this is an unsolved problem. But there are lots of useful methods. See the...continue reading.
Sorry I haven’t been very active on this blog lately, but now that I have more time, that will change. I’ve got myriad things to say. To begin with, then,...continue reading.
I’ve recently completed fastStat, https://github.com/matloff/fastStat,a quick introduction to statistics for those who’ve had a calculus-based probability course. Many such people later need to do statistics, and this will give them...continue reading.
Many of you may have heard of ChatGPT, a dazzling new AI tool. We are hearing lots of gushing praise for the tool. Well, how well does it do in...continue reading.
The field of data privacy has long been of broad interest. In a medical database, for instance, how can administrators enable statistical analysis by medical researchers, while at the same...continue reading.
I have a new short writeup, showing common R design patterns, implemented side-by-side in base-R and Tidy. As readers of this blog know, I strongly believe that Tidy is a...continue reading.
During the last year or so, I’ve been quite interested in the issue of fairness in machine learning. This area is more personal for me, as it is the confluence...continue reading.
George Ostrouchov, one of R’s top parallel computing experts, will run a workshop on cluster parallel computing in R next week. BTW, even a multicore laptop is a “cluster,” so...continue reading.
As many readers of this blog know, I strongly believe that R learners should be taught base-R, not the tidyverse. Eventually the students may settle on using a mix of...continue reading.
Object-Oriented Programming (OOP) is more than just a programming style; it’s a philosophy. R has offered various forms of OOP, starting with S3, then (among others) S4, reference classes, and...continue reading.
Prominent statistician Frank Harrell has come out with a radically new R tutorial, rflow. The name is short for “R workflow,” but I call it “R in a box” –everything...continue reading.
As a longtime R user and someone with a passionate interest in how people learn, I continue to be greatly concerned about the use of the Tidyverse in teaching noncoder...continue reading.
Differential privacy (DP) is an approach to maintaining the privacy of individual records in a database, while still allowing statistical analysis. It is now perceived as the go-to method in...continue reading.
I take my title here from the “too clever by half” paper, “What’s Not What with Statistics” of many years ago. Or I just as appropriately could have borrowed the...continue reading.
I tend to be blase’ about breathless claims of “new” methods and concepts in statistics and machine learning. Most are “variations on a theme.” However, the notion of double descent,...continue reading.
I tend to be blase’ about breathless claims of “new” methods and concepts in statistics and machine learning. Most are “variations on a theme.” However, the notion of double descent,...continue reading.
The June 2020 issue of JASA features a highly insightful essay by Brad Efron, dean of the world’s statisticians. The article is accompanied by commentary by a number of statistical...continue reading.
The June 2020 issue of JASA features a highly insightful essay by Brad Efron, dean of the world’s statisticians. The article is accompanied by commentary by a number of statistical...continue reading.