R News

Manifold Visualization: Second Example

by matloff · October 2, 2018

This article is originally published at https://matloff.wordpress.com

In last night’s post, I introduced prVis(), a new visualization tool which we have invented, available in our polyreg package. Recall that prVis() is intended as a simpler alternative to recent visualization tools like t-SNE and UMAP. Here I will post another example.

The dataset is prgeng, included in the package. It consists of wage income, age, gender, and so on, of Silicon Valley programmers and engineers, from the 2000 Census. We first load the data and then choose some of the variables (age, gender, education and occupation):

getPE()
pe1 <- pe[,c(1,2,6:7,12:16)]

So, let’s plot the graph:

The graph consists of streaks, about a dozen of them. What do they represent? To investigate that question, we call another polyreg function:

addRowNums(16,z)

This will write the row numbers of 16 random points from the dataset onto the graph that I just plotted, which now looks like this:

Due to overplotting, the numbers are difficult to read, but are also output to the R console:

[1] “highlighted rows:”
[1] 2847
[1] 5016
[1] 5569
[1] 6568
[1] 6915
[1] 8604
[1] 9967
[1] 10113
[1] 10666
[1] 10744
[1] 11383
[1] 11404
[1] 11725
[1] 13335
[1] 14521
[1] 15462

Rows 2847 and 10666 seem to be on the same streak, so they must have something in common. Let’s take a look.

> pe1[2847,]
         age sex ms phd occ1 occ2 occ3 occ4 occ5
2847 32.3253   1  1   0    0    0    0    0    0
> pe1[10666,]
          age sex ms phd occ1 occ2 occ3 occ4 occ5
10666 45.36755  1  1   0    0    0    0    0    0

Aha! Except for age, these two workers are identical in terms of gender (male), education (Master’s) and occupation (occ. category 6). Now those streaks make sense; each one represents a certain combination of the categorical variables.

Well, then, let’s see what UMAP does:

plot(umap(pe1))

The result is

The pattern here, if any, is not clear.

So in both examples, both last night’s and tonight’s, prVis() was not only simpler but also much more visually interpretable than UMAP.

In fairness, I must point out:

I just used the default values of umap() in these examples. It would be interesting to explore other values. On the other hand, it may be that UMAP simply is not suitable for partially categorical data, as we have in this second example.
For most other datasets I’ve tried, prVis() and UMAP give similar results.

Even so, these two points show the virtues of using prVis() . We are getting equal or better quality while not having to worry about settings for various hypeparameters.

Thanks for visiting r-craft.org
This article is originally published at https://matloff.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Manifold Visualization: Second Example

You may also like...

Categories

Manifold Visualization: Second Example

You may also like...

np.random.rand Explained

Introducing Shiny App Stories

Quick R tip: ggplot in functions needs some extra care

Categories