Skip to main content

Apache Spark MCQ

Displaying 1 - 10 of 173

Data Preparation for K-Means Clustering: Suppose you have the following information available about different companies in your city.

Company Feature 1(Revenue in rupees)  Feature 2 (Amount towards the NGO in rupees)
C1 25,00,000 10,000
C2 50,00,000 5,000
C3 1,00,00,000 20,000
C4 10,00,000 13,000
C5 1,50,00,000 20,000 on

Note: This data does not have any missing values.

Now, as part of organising an NGO event, you decide to approach these companies to raise funds. Since it would take significant time and effort to approach all the companies on your list, you decide to form clusters of these companies.

You are using a K-means algorithm to form clusters with Feature 1 and Feature 2 as the features of each company. But, before applying the clustering algorithm, which of the following transformers is necessary to be used?

Spark Streaming Record Count: Consider the below image

spark image

The timestamps inside the boxes given above are event times. Find the record count for the windows 9:55–10:05, 10:00–10:10, 10:05–10:15 and 10:10–10:20 

given the following details: 

Output mode: Complete mode

Batch time = 5 minutes, Window duration = 10 minutes, Sliding interval = 5 minutes

Watermark = 5 minutes

The record at 10:03 will not be counted.

Subscribe to Apache Spark MCQ


At ProgramsBuzz, you can learn, share and grow with millions of techie around the world from different domain like Data Science, Software Development, QA and Digital Marketing. You can ask doubt and get the answer for your queries from our experts.