R News

Understanding Overhead Issues in Parallel Computation

by matloff · July 30, 2017

This article is originally published at https://matloff.wordpress.com

In my talk at useR! earlier this month, I emphasized the fact that a major impediment to obtaining good speed from parallelizing an algorithm is systems overhead of various kinds, including:

Contention for memory/network.
Bandwidth limits — CPU/memory, CPU/network, CPU/GPU.
Cache coherency problems.
Contention for I/O ports.
OS and/or R limits on number of sockets (network connections).
Serialization.

During the Q&A at the end, one person in the audience asked how R programmers without a computer science background might acquire this information. A similar question was posed here today by a reader on this blog, to which I replied,

That question was asked in my talk. I answered by saying that I have an introduction to such things in my book, but that this is not enough. One builds this knowledge in haphazard ways, e.g. by search terms like “cache miss” and “network latency” on the Web, and above all, by giving it careful thought and reasoning things out. (When Nobel laureate Richard Feynman was a kid, someone said in awe, “He fixes radios by thinking!”)
Join an R Users Group, if there is one in your area. (And if not, then start one!) Talk about these things with them (though if you follow my above advice, you may find you soon know more than they do).

The book I was referring to was Parallel Computing for Data Science: With Examples in R, C++ and CUDA (Chapman & Hall/CRC, The R Series, Jun 4, 2015.

I have decided that the topic of system overhead issues in parallel computation is important enough for me to place Chapter 2 on the Web, which I have now done. Enjoy. I’d be happy to answer your questions (of a general nature, not on your specific code).

We are continuing to add more features to our R parallel computation package, partools. Watch this space for news!

By the way, the useR! 2017 videos are now on the Web, including my talk on parallel computing.

Thanks for visiting r-craft.org
This article is originally published at https://matloff.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Understanding Overhead Issues in Parallel Computation

You may also like...

Categories

Understanding Overhead Issues in Parallel Computation

You may also like...

Ensemble learning for time series forecasting in R

R Weekly 2022-W45, Level Up Your Plots, GoogleSheets as a Database in R and Time-boxed Yak Shaving

R Weekly 2018-20 Conferences, Excel, Interpretable

Categories