Skip to main content
Displaying 1 - 10 of 55

Your organization has just started using AWS Redshift for their data warehousing needs. They have determined that they will be using it for the next 3 years as per the current business plan. As a solutions architect, which of the following solutions would you suggest for saving on Redshift costs?

Explanation: If you intend to keep your Amazon Redshift cluster running continuously for a prolonged period, you should consider purchasing reserved node offerings. These offerings provide significant savings over on-demand pricing

You are working for an organization who were using a Redshift cluster with 3 ds2.xlarge nodes to store about 5 TB of raw data. They expect the data to grow further at a rate of 2 TB every month. They have a BI dashboard which makes use of all data present on the Redshift cluster.

They are looking for an optimal solution to meet their increased data growth and storage requirement. Which solution will be the most efficient and cost-optimised? 

Explanation: RA3 nodes enable you to resize compute and storage independently to achieve the best price and performance.

You are working on tuning the performance on an SQL query on Redshift cluster. You generated an EXPLAIN plan for this query and see the following output:


explain select eventid, eventname, event.venueid, venuename from event, venue where event.venueid = venue.venueid


XN Hash Join DS_DIST_OUTER  (cost=2.52..58653620.93 rows=8712 width=43)
Hash Cond: ("outer".venueid = "inner".venueid)
->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=23)
->  XN Hash  (cost=2.02..2.02 rows=202 width=22)
->  XN Seq Scan on venue  (cost=0.00..2.02 rows=202 width=22)
(519 rows)

How would you tune this query?

Explanation: Since the outer table venue is being distributed to the compute nodes, it is recommended to distribute this table as ALL so that it can be available on all compute nodes.

You are working for an insurance company which uses 3-node DS2.XLARGE Redshift cluster to store claims data.

There are some BI dashboards which query this data and show some key metrics such as total claim value and the number of claims. These dashboards are updated every hour through SQL queries. There is also a group of data scientists who query the database intermittently to analyse risks of some claims. Recently, the data scientists have complained of slow queries.

What will be the most cost-effective solution to increase the performance of your Redshift cluster?

Explanation: Creating separate WLM queues for data scientists and BI Dashboards will ensure that there is no contention for resources and help in enhancing the performance of your Redshift cluster.

You are managing the Redshift database for a company and you have a requirement to set up monitoring on the Redshift cluster and send email notifications whenever the CPU Utilization for your cluster crosses 90%. Which of the following services would you use to meet this requirement?

Explanation: Amazon CloudWatch monitors the metrics from the database instances and triggers a CloudWatch alarm when a threshold is exceeded. You can configure a Cloudwatch alarm to send email whenever a metric crosses threshold.

You are working for a Pharma company which receives data from multiple external vendors for processing. You are receiving the data in large files (several GBs) stored on S3. Each night, COPY command runs to load data into a 5-node DC2.large Redshift cluster. You found that the COPY command is taking a long time to complete. How would you optimise the data load process?

Explanation: Each node in Redshift cluster has multiple slices and it is recommended to split the files into multiples of number of slices so that all slices can load each split file in parallel. Further, GZIP encryption will reduce the amount of data to be loaded and hence increase the performance of COPY command.

Subscribe to Amazon Redshift MCQ


Elix is a premium wordpress theme for portfolio, freelancer, design agencies and a wide range of other design institutions.