AWS Athena

I have been going through use cases where some basic analytics needed to be run on structured logs generated by our system. The way I did it till now, is to spin up an EMR cluster, load my logs on it and execute hive queries.
Then I found Athena.

Amazon Athena is an interactive query service that makes it easy to analyze
data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.

We already have most of our relevant data in S3. Which means we cold use Athena directly with it. Other Benefits:

Athena is serverless, so there is no infrastructure to set up or manage,
and you pay only for the queries you run. 
Athena scales automatically -executing queries in parallel—so results
are fast, even with large datasets and complex queries.

I wrote up a dummy code that created files of user data

I created a few files and then setup a directory structure in s3:

The next step was to query this data in Athena. For this we need to make a database and table in Athena.

For each dataset, a table needs to exist in Athena. The metadata in the
table tells Athena where the data is located in Amazon S3, and 
specifies the structure of the data, for example, column names, data 
types, and the name of the table. Databases are a logical grouping of 
tables, and also hold only metadata and schema information for a dataset.
The tables creation process registers the dataset with Athena. This 
registration occurs in the AWS Glue Data Catalog and enables Athena to 
run queries on the data.

Step 1: Create the database:

CREATEDATABASE users_db

Step 2: Create the users table:

CREATEEXTERNALTABLE IF NOTEXISTS userRecs ( 
  user_id int, 
  name String,
  phone_no String, 
  age int,
  hobbies array<string>,
state String,
  country String)
COMMENT'User details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY'\t'
COLLECTION ITEMS TERMINATED BY','
LINES TERMINATED BY'\n'
STORED AS TEXTFILE
LOCATION's3://athene-test-dump/';

The table here is created over the base S3 bucket. I did not apply any partioning. Simply provided the base folder and let Athena detect the files under the folder hierarchy.

When you query an existing table, under the hood, Amazon Athena uses Presto,
 a distributed SQL engine.

I executed a simple query:

The query execution history is as below:

I also decided to setup a partitioned table for the same data:

CREATEEXTERNALTABLE IF NOTEXISTS users_partitioned (
         user_id int,
         name String,
         phone_no String,
         age int,
         hobbies array<string>,
state String,
         country String 
) COMMENT'User details' PARTITIONED BY (dataset_date String) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\t' COLLECTION ITEMS TERMINATED BY',' LINES TERMINATED BY'\n' STORED AS TEXTFILE LOCATION's3://athene-test-dump/';

The query executed successfully and Athena UI gave the below comment:

Query successful. If your table has partitions, you need to load these partitions
to be able to query data. You can either load all partitions or load them 
individually. If you use the load all partitions (MSCK REPAIR TABLE) command, 
partitions must be in a format understood by Hive. Learn more.

My data is not partitioned in the Hive format. So I will have to manually load the partitions.

ALTERTABLE users_partitioned ADD PARTITION (dataset_date='2020-05-28')

location's3://athene-test-dump/2020/05/28';

If instead of date being parts of different folder levels (i.e. s3://athene-test-dump/2020/05/28), I had it as 's3://athene-test-dump/dataset=2020-05-28', than I could have loaded it using the MSCK REPAIR TABLE command.

AWS Athena

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...