stringdist 0.9.6 on CRAN: new features
This article is originally published at http://www.markvanderloo.eu
stringdist version 0.9.6 arrived on CRAN on 16 july 2020.
This release brings a few new features.
Fuzzy text search
Search text for approximate matches of a search string using any stringdist distance. There are several functions that allow you to
- detect whether there is a match within a certain maximum distance
- return the position of the first best match
- return the best match.
There are several interfaces for this. Functions grab
and grabl
work like base grep
and grepl
. The function extract
has output similar to stringr::str_extract
. The workhorse function is called afind
(approximate find), which returns all results for multiple search patterns.
There is also a new implementation of the popular 'cosine' distance that I developed especially for this purpose. It is called 'running_cosine' and it avoids double work otherwise done with by the standard 'cosine' method. The result is a much faster implementation (up to about 100 times faster).
string similarity matrices
Thanks to a PR by Johannes Gruber stringdist now has a function to compute string similarity matrices: stringsimmatrix
Thanks for visiting r-craft.org
This article is originally published at http://www.markvanderloo.eu
Please visit source website for post related comments.