This week, in the spirit of the premiere of Star Wars: The Last Jedi, we devoted our weekly wrangling challenge to all things Star Wars. Could we figure out which species preferred to live on swamp planets? Or how fast spaceships fly? We tasked our team to find out. In this post, I’ll share a bit of that behind-the-scenes wrangling, and a few pictures of our post-challenge celebration.
The Wrangling Challenge: May the Force Be With You
If you’re not familiar with Trifacta, we make a product that helps data analysts of all stripes clean and prepare data for analysis. Trifacta was built for all users, from non-technical analysts who have typically used Excel to SQL and python pros who would prefer a visual and fast solution to transform data – whether that’s reformatting timestamps or aggregating and averaging sales records – so it can be analyzed.
Our customers wrangle some incredibly complex datasets every day. To step into their shoes, we run bi-weekly “Wrangle U” challenges at Trifacta using our product, Wrangler Enterprise. We challenge ourselves with the same kinds of data cleansing and data prep problems our customers face, such as standardizing and structuring tissue sample manifests (like our biotech customers might have to manage) or un-pivoting weekly sales data or making sense of sensor data from IOT systems.
This last week, however, we did something a little different: wrangled Star Wars data from a public dataset, https://swapi.co. For this challenge, we compiled the data from the SWAPI database which consists of datasets with Star Wars planets, characters, spacecrafts, and more. After pairing up Trifacta employees across the world, we challenged each duo to wrangle the dataset to uncover answers to questions such as:
- What green skinned character has the max mass, and what is that mass?
- What are the spaceships that fly at 1000 km in a planet atmosphere?
- What are the species that prefer to live on swamp planets?
Download Wrangler and create a recipe to answer any one of the questions above, tweet your answer to @trifacta with a screenshot of your recipe, and we’ll send you a special edition Trifacta t-shirt. If you need help with the dataset or have wrangling questions you can email me at jsilvers at trifacta.
While the dataset wasn’t Hadoop-sized, the wrangling was real. People had to extract data from JSON files and make sense of the alien lifeforms and fantastic spacecraft to answer both qualitative and quantitative questions
To wrangle the data, the teams needed to join, or blend, two or more datasets, aggregate and standardize data, derive new values such as average, and more. To make this challenge more fun, we offered prizes for the funniest and most creative answers as well as for the most correct responses in the least number of steps. One Trifacta employee even solved the Porg mystery.
Off to the Movies
The Star Wars challenge we laid out for ourselves was part of a whole week of company festivities for the 2017 end-of-year holidays. We ended the week with the premiere of Star Wars: the Last Jedi, and from there to a holiday celebration to announce the winners of the Wrangling U challenge and have some fun.
If that sounds and looks like fun, we’re hiring!
Bigdata and data center