R / R News

Add Dressbarn to the Continued Retailpocalypse

by hrbrmstr · May 22, 2019

This article is originally published at https://rud.is/b

I’ve talked about the retailpocalypse before and this morning I was greeted with the news about Dressbarn closing all 650 stores as I fired up a browser.

I tweeted some pix and data but not everyone is on Twitter so I’m just posting a blog-blurb here with the code and data links.

Code is below and at https://paste.sr.ht/~hrbrmstr/af6da1af0314426255c65bc2fc254e0abb2190c3.

Data is at https://rud.is/dl/dressbarn-locations.json.gz.

Images are in a gallery below the code.

library(rvest)
library(stringi)
library(urltools)
library(worldtilegrid) # install from sh/gl/gh or just remove the theme_enhange_wtg() calls
library(statebins)
library(tidyverse)

# this is the dressbarn locations directory page
pg <- read_html("https://locations.dressbarn.com/")

# this is the selector to get the main links
html_nodes(pg, "a.Directory-listLink") %>% 
  html_attr("href") -> locs

# PRE-NOTE
# No sleep() code (I looked at the web site, saw how many self-requests it makes for all DB
# resources and concluded that link scrapes + full page captures would not be burdensome
# plus they're going out of business)

# basic idea here is to get all the main state location pages
# some states only have one store so the link goes right to that so handle that condition
# for ones with multiple stores get all the links on the state index page
# for links on state index page that have multiple stores in one area,
# grab all those; then, concatenate all the final target store links into one 
# character vector.

keep(locs, ~nchar(.x) == 2) %>% 
  sprintf("https://locations.dressbarn.com/%s", .) %>% # state has multiple listings
  map(
    ~read_html(.x) %>% 
      html_nodes("a.Directory-listLink") %>% 
      html_attr("href") %>% 
      sprintf("https://locations.dressbarn.com/%s", .)
  ) %>% 
  append(
    keep(locs, ~nchar(.x) > 2) %>% sprintf("https://locations.dressbarn.com/%s", .) # state has one store
  ) %>% 
  flatten_chr() %>% 
  map_if(
    ~stri_count_fixed(.x, "/") == 4, # 4 URL parts == there's another listing page layer
    ~read_html(.x) %>% 
      html_nodes("a.Teaser-titleLink") %>% 
      html_attr("href") %>% 
      stri_replace_first_fixed("../", "") %>% 
      sprintf("https://locations.dressbarn.com/%s", .)
  )  %>% 
  flatten_chr() -> listings

# make a tibble with the HTML source for the final store location pages
# so we don't end up doing multiple retrievals

tibble(
  listing = listings,
  html_src = map_chr(listings, ~httr::GET(.x) %>% httr::content(as = "text"))
) -> dress_barn

# save off our work in the event we have a (non-R-crashing) issue
tf <- tempfile(fileext = ".rds")
print(tf)
saveRDS(dress_barn, tf) 

# now, get data from the pages
#
# first, turn all the character vectors into something we can get HTML nodes from
#
# dressbarn web folks handliy put an "uber" link on each page so we get lon/lat for free in that URL
# they also handily used an <address> semantic tag in the proper PostalAddress schema format
# so we can get locality and actual address, too
mutate(
  dress_barn,
  parsed = map(html_src, read_html),
  uber_link = 
    map_chr(
      parsed, ~html_nodes(.x, xpath=".//a[contains(@href, 'uber')]") %>% 
        html_attr("href") 
    ), 
  locality = map_chr(
    parsed, ~html_node(.x, xpath=".//address/meta[@itemprop = 'addressLocality']") %>% 
      html_attr("content")
  ),
  address = map_chr(
    parsed, ~html_node(.x, xpath=".//address/meta[@itemprop = 'streetAddress']") %>% 
      html_attr("content")
  ),
  state = stri_match_first_regex(
    dress_barn$listing, 
    "https://locations.dressbarn.com/([[:alpha:]]+)/.*$"
  )[,2]
) %>% 
  bind_cols(
    param_get(.$uber_link, c("dropoff%5Blatitude%5D", "dropoff%5Blongitude%5D")) %>% 
      as_tibble() %>% 
      set_names(c("lat", "lon")) %>%
      mutate_all(as.double)
  ) -> dress_barn

# save off our hard work with the HTML source so we can do more later if need be
select(dress_barn, -parsed) %>% 
  saveRDS("~/Data/dressbarn-with-src.rds")

# save off something others will want
select(dress_barn, -parsed, -html_src, -listing) %>% 
  jsonlite::toJSON() %>% 
  write_lines("~/Data/dressbarn-locations.json.gz")

# simple map
ggplot(dress_barn, aes(lon, lat)) + 
  geom_jitter(size = 0.25, color = ft_cols$yellow, alpha = 1/2) +
  coord_map("polyconic") +
  labs(
    title = "Locations of U.S. Dressbarn Stores",
    subtitle = "All 650 locations closing",
    caption = "Source: Dressbarn HTML store listings;\nData: <https://rud.is/dl/dressbarn-locations.json.gz> via @hrbrmstr"
  ) +
  theme_ft_rc(grid="") +
  theme_enhance_wtg()

unlink(tf) # cleanup 

count(dress_barn, state) %>% 
  left_join(tibble(name = state.name, state = tolower(state.abb))) %>% 
  left_join(usmap::statepop, by = c("name"="full")) %>% 
  mutate(per_capita = (n/pop_2015) * 1000000) %>% 
  arrange(desc(per_capita)) %>% 
  select(name, n, per_capita) %>% 
  arrange(desc(per_capita)) %>% 
  complete(name = state.name) %>% 
  statebins(state_col = "name", value_col = "per_capita", ) +
  scale_fill_r7c("Closing\nper-capita") +
  labs(title = "Dressbarn State per-capita closings") +
  theme_ipsum_rc(grid="") +
  theme_enhance_wtg()

Thanks for visiting r-craft.org
This article is originally published at https://rud.is/b
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Add Dressbarn to the Continued Retailpocalypse

You may also like...

Categories

Add Dressbarn to the Continued Retailpocalypse

You may also like...

R Weekly 2023-W27 Targets validation workflow, %dofuture%, gmailr 2.0.0

Use of Differential Privacy in the US Census–All for Nothing?

Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata

Categories