An Amazon Web Services (AWS) S3 cloud storage bucket containing information from data analytics firm Alteryx has been found publicly exposed, comprising the personal information of 123 million US households.
According to UpGuard, exposed within the repository were datasets belonging to Alteryx partners, consumer credit reporting agency Experian, and the US Census Bureau.
Full datasets for both Experian’s ConsumerView marketing database and the 2010 US Census were available.
The 36 GB data file titled “ConsumerView_10_2013” contained over 123 million rows, each one signifying a different American household. A similar file was seen by UpGuard when the personal details of 198 million American voters, compiled in a dataset by a data firm used by the Republican National Committee, were exposed.
To highlight the breadth of the issue, UpGuard said the exposed data reveals over 3.5 billion fields of personally identifying details and data points about virtually every American household, including racial and ethnic information.
The spreadsheet uses anonymised identifiers, but the information in the other few billion fields are very detailed, UpGuard said.
Home addresses, contact information, mortgage status, financial histories, and very specific analysis of purchasing behaviour — such as domestic travel habits, if someone is a cat enthusiast, and their sporting interests — is up for grabs in the exposed data.
Default security settings for S3 buckets usually allow only authorised users to access the contents; however, UpGuard reports the bucket was configured via permission settings to allow any AWS “Authenticated Users” to download its stored data.
Authenticated users are any user that has an AWS account.
“Simply put, one dummy sign-up for an AWS account, using a freshly created email address, is all that was necessary to gain access to this bucket’s contents,” UpGuard wrote in its report.
Alteryx took ownership for the bucket after it had secured it, UpGuard said, with an Alteryx spokesperson playing down the leak to Forbes.
“Specifically, this file held marketing data, including aggregated and de-identified information based on models and estimations provided by a third-party content provider, and was made available to our customers who purchased and used this data for analytic purposes,” the spokesperson is quoted by Forbes as saying. “The information in the file does not pose a risk of identity theft to any consumers.”