JSON is a popular file format used to store unstructured content. Many popular databases use JSON to, for example, store media files. While it’s very easy for machines to parse JSON, JSON presents some challenges for analysts because is nested inside of the document.

For our example, we’ll look at Twitter data. Opening our 59MB file in a text editor reveals the data.

{“twitter_info1″:”{”hashtags”:[],”created_date”:”2015-04-16T19:38:58”,”mention”:[”USTELCOCares”],”twitter_text”:”@USTELCOcares i haven’t been an USTELCO customer in 10+ years, but suddenly received bill for svcs never used. customer svc ppl can’t find record”}”,”user_info1″:”{”user_name”:”Jason Oh”,”user_id”:”jasonoh”,”screen_name”:[”USTELCOCares”],”location”:”ManhUSTELCOan”,”friends_count”:”79”}”,”created_date”:”2015-04-16″,”created_time”:”19:38:58″,”twitter_text”:”@USTELCOcares i haven’t been an USTELCO customer in 10+ years, but suddenly received bill for svcs never used. customer svc ppl can’t find record”,”twitter_text2″:”@USTELCOcares”,”Handle”:”@USTELCOcares”,”friends_count2″:”79″, “location2″:”ManhUSTELCOan”,”screen_name2″:”[”USTELCOCares”]”,”user_id2″:”jasonoh”,”user_name2″:”Jason Oh”,”friends_count1″:”79″,”location1″:”ManhUSTELCOan”,”screen_name1″:”[”USTELCOCares”]”,”user_id1″:”jasonoh”,”user_name1″:”Jason Oh”,”friends_count”:”79″,”location”:”ManhUSTELCOan”,”user_id”:”jasonoh”,”user_name”:”Jason Oh”} {“twitter_info1″:”{”hashtags”:[”USTELCO”,”cisco”,”network”],”created_date”:”2015-04-16T19:39:45”,”mention”:[],”twitter_text”:”#USTELCO taps Ciena, #Cisco to build its 100G U.S. metro #network http://t.co/SZLdecjWdC”}”,”user_info1″:”{”user_name”:”ExterNetworks Inc”,”user_id”:”ExterNetworks”,”screen_name”:[],”location”:”Piscataway, NJ 08854 ”,”friends_count”:”1229”}”,”created_date”:”2015-04-16″,”created_time”:”19:39:45″,”twitter_text”:”#USTELCO taps Ciena, #Cisco to build its 100G U.S. metro #network http://t.co/SZLdecjWdC”,”twitter_text2″:null,”Handle”:null,”friends_count2″:”1229″,”location2″:”Piscataway, NJ 08854 “,”screen_name2″:”[]”,”user_id2″:”ExterNetworks”,”user_name2″:”ExterNetworks Inc”,”friends_count1″:”1229″,”location1″:”Piscataway, NJ 08854 “,”screen_name1″:”[]”,”user_id1″:”ExterNetworks”,”user_name1″:”ExterNetworks Inc”,”friends_count”:”1229″,”location”:”Piscataway, NJ 08854 “,”user_id”:”ExterNetworks”,”user_name”:”ExterNetworks Inc”}

For the purpose of demonstration, we can format one record a little differently by hand so we can more clearly see the content of the tweet and the associated :

{  

   “twitter_info1”:“{”hashtags”:[],”created_date”:”2015-04-16T19:38:58”,”mention”:[”USTELCOCares”],“twitter_text”:”@USTELCOcares i haven’t been an USTELCO customer in 10+ years, but suddenly received bill for svcs never used. customer svc ppl can’t find record”}”,

   “user_info1”:“{”user_name“:”Jason Oh”,”user_id”:”jasonoh”,”screen_name”:[”USTELCOCares”],”location”:“ManhUSTELCOan”,”friends_count”:”79”}”,

   “created_date”:“2015-04-16”,

   “created_time”:19:38:58,

   “twitter_text”:“@USTELCOcares i haven’t been an USTELCO customer in 10+ years, but suddenly received bill for svcs never used. customer svc ppl can’t find record”,

   “twitter_text2”:“@USTELCOcares”,

   “Handle”:“@USTELCOcares”,

   “friends_count2”:“79”,

   “location2”:“ManhUSTELCOan”,

   “screen_name2”:“[”USTELCOCares”]”,

   “user_id2”:“jasonoh”,

   “user_name2”:“Jason Oh”,

   “friends_count1”:“79”,

   “location1”:“ManhUSTELCOan”,

   “screen_name1”:“[”USTELCOCares”]”,

   “user_id1”:“jasonoh”,

   “user_name1”:“Jason Oh”,

   “friends_count”:“79”,

   “location”:“ManhUSTELCOan”,

   “user_id”:“jasonoh”,

   “user_name”:“Jason Oh”

}

The JSON document consists of name/value pairs, where values can also be arrays. In the above example, on a user record, one attribute is named `friends_count1` and the value is `79`. As an analyst, you may have thousands or millions of JSON documents to analyze. Pulling out the values and ordering them can be extremely complex. Fortunately, with Trifacta, there are some easy shortcuts that make working with JSON data a snap.

Un-nest Transformation

When you import your JSON file, Trifacta automatically begins to organize your data and provide column headers.

 

Built into every edition of Trifacta is an unnest transformation. The Unnest Transform expands the contents of an array or map and creates new columns in your dataset for each element of the array or map. For maps, unnest takes each key-value pair and creates a new column with the key as the column header and the values as the columns rows. Simply selecting all the data in your column, Trifacta will recognize each name/value pair in the JSON map and split them out into separate columns with the appropriate headings.

More Trifacta users can also write a recipe using the Transform Builder to achieve the same results.

Your Turn

And that’s it! As always, if you have any questions, you can contact our support team for help. For more information on our unnest function, you can watch our tutorial or read the documentation, all on our Trifacta Support Portal.



Source link
Bigdata and data center

LEAVE A REPLY

Please enter your comment!
Please enter your name here