In what is rapidly becoming a series — cool things you can do with R in a tweet — Julia Silge demonstrates the of members of the US house of representatives on Wikipedia in just R statements:

Since Twitter munges the URL in the third line when you cut-and-paste, here’s a plain-text version of Julia’s :

library(rvest)
library(tidyverse)

h <- read_html("https://en.wikipedia.org/wiki/Current__of_the_United_States_House_of_Representatives")

reps <- h %>%
 html_node("#mw-content-text > div > table:nth-child(18)") %>%
 html_table()

reps <- reps[,c(1:2,4:9)] %>% as_tibble()

And sure enough, here’s what the reps object looks like in the RStudio viewer:

Scrape

As Julia notes it’s not perfect, but you’re still 95% of the way there to gathering from a page intended for human rather than computer consumption. Impressive!





Source link
Bigdata and data center

LEAVE A REPLY

Please enter your comment!
Please enter your name here