AWS Athena: Pricing Basics And How To Lower Costs

Updated on
2 min read

Athena is Amazon’s tool for scanning data hosted in S3. It was released in late 2016, and it offers several useful capabilities while charging a modest price per query. Often compared to Amazon’s Redshift, Athena offers more specific capabilities for an exponentially lower price.

Understanding the pricing structure

Athena charges you $5,00 per terabyte of data scanned, with a minimum charge of 10mb. Meaning that any query executed on a file smaller than 10mb will cost you the same as if the file had exactly 10mb.

There are no charges for failed queries. Canceled queries are charged based on the amount of data scanned up to that point. Simple DDL statements are also free of charge.

Breaking down those costs, $5 per terabyte equals $0,0048828125 per gigabyte or  $0,000004768 per megabyte of data scanned. But since the minimum charge is for 10mb, then the cheapest query you can run on Athena costs you $0,00004768.  In other words, even running thousands of tiny Athena queries will only cost you a cup of coffee a month.

1 - File compression

The price of queries is calculated not by the amount of information or lines contained in the data, but by the size of the file. That makes compression a powerful tool in reducing Athena costs.

The AWS website suggests Gzip as a compression software. But of course, other methods can also be used. Anything that can reduce the size of the files stored in S3.

Size reduction has a linear effect on costs. A query that is priced at $30 will cost $10 if the size of the file being queried is reduced by a third.

2 - Columnar storage

By storing your data in a columnar format — AWS suggests Apache Parquet — you allow Athena to only read the columns relevant to the query, instead of reading the whole file.

For example, a large table stored in S3 with a file size of 10 terabytes can be scanned in full for $50. But if you request only requires one of the ten columns in that file to be read, the cost of this query will come down to just $5 — assuming the columns are of equal size.

Conclusion

There is much that can be achieved with Athena. It’s a potent serverless computing solution, and unfortunately, many people still default to using Redshift when Athena would suffice. 

AWS Athena is a powerful tool to pull either structured or unstructured data using queries. Using it you can avoid writing interfaces or programs to push data Redshift or databases by querying the data directly. Also Analyzing and visualizing nested JSON data with Amazon Athena is also a straightforward process. There were a lot of use cases like this for our clients where we avoided using Redshift and used AWS Athena says Ismail Shaik CEO of Ktree.com.

There is plenty to learn and explore in the world of low-cost serverless computing solutions. Especially within the AWS suite.

Disclaimer: This is a promotional article. The New Indian Express does not take any editorial responsibility for it.

Related Stories

No stories found.

X
The New Indian Express
www.newindianexpress.com