A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for
exploration and analysis. Which of the following services would both ingest and store this data in the correct format?
B. Amazon Kinesis Data Streams
C. Amazon Kinesis Data Firehose
D. Amazon Kinesis Data Analytics
Correct Answer: C


An agency collects census information within a country to determine healthcare and social program needs by province
and city. Does the census form collect responses for approximately 500 questions from each citizen Which combination of
algorithms would provide the appropriate insights? (Select TWO )
A. The factorization machines (FM) algorithm
B. The Latent Dirichlet Allocation (LDA) algorithm
C. The principal component analysis (PCA) algorithm
D. The k-means algorithm
E. The Random Cut Forest (RCF) algorithm
Correct Answer: CD
The PCA and K-means algorithms are useful in the collection of data using census form.


A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon
Kinesis Data Firehose To support training an improved machine learning model, training records will require new but
simple transformations and some attributes will be combined The model needs to be retrained daily
Given a large number of stores and the legacy data ingestion, which change will require the LEAST amount of
development effort?
A. Require that the stores switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3
then use AWS Glue to do the transformation
B. Deploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each
day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3
C. Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records
accumulating on Amazon S3, and output the transformed records to Amazon S3.
D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehouse stream that transforms
raw record attributes into simple transformed values using SQL.
Correct Answer: D


A Machine Learning Specialist is working for a credit card processing company and receives an unbalanced dataset
containing credit card transactions. It contains 99,000 valid transactions and 1,000 fraudulent transactions The Specialist is asked to score a model that was run against the dataset The Specialist has been advised that identifying
valid transactions are equally as important as identifying fraudulent transactions What metric is BEST suited to score the
A. Precision
B. Recall
C. Area Under the ROC Curve (AUC)
D. Root Mean Square Error (RMSE)
Correct Answer: A


A company\\’s Machine Learning Specialist needs to improve the training speed of a time-series forecasting model
using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to
complete. The training needs to be run daily.
The model accuracy js acceptable, but the company anticipates a continuous increase in the size of the training data
and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding
effort and infrastructure changes
What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?
A. Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the
B. Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker.
Parallelize the training to as many machines as needed to achieve the business goals.
C. Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed
to achieve business goals.
D. Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the
business goals.
Correct Answer: B

A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics,
past visits, and locality information. The Specialist must develop a machine learning approach to identify customer
shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations.
Which solution should the Specialist recommend?
A. Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer
B. A neural network with a minimum of three layers and random initial weights to identify patterns in the customer
C. Collaborative filtering based on user interactions and correlations to identify patterns in the customer database
D. Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database
Correct Answer: C


A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket A Machine Learning The specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query
this data?
A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
B. Use AWS Glue to catalog the data and Amazon Athena to run queries
C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the queens
D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries
Correct Answer: D


A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket protected with server-side encryption using
How should the ML Specialist define the Amazon SageMaker notebook instance so it can read the same dataset from
Amazon S3?
A. Define security group(s) to allow all HTTP inbound/outbound traffic and assign those security group(s) to the Amazon
SageMaker notebook instance.
B. ?configure the Amazon SageMaker notebook instance to have access to the VPC. Grant permission in the KMS key
policy to the notebook\\’s KMS role.
C. Assign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset. Grant permission in the
KMS key policy to that role.
D. Assign the same KMS key used to encrypt data in Amazon S3 to the Amazon SageMaker notebook instance.
Correct Answer: D
Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/encryption-at-rest.html


A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate
VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance\\’s Amazon EBS
volume and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon
SageMaker notebook instance\\’s EBS volume or Amazon EC2 instance within the VPC.
Why is the ML Specialist not seeing the instance visible in the VPC?
A. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run
outside of VPCs.
B. Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts.
C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.
D. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.
Correct Answer: C
Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html


A Machine Learning Specialist needs to move and transform data in preparation for training Some of the data needs to
be processed in near-real-time and other data can be moved hourly There are existing Amazon EMR MapReduce jobs
to clean and feature engineering to perform on the data.
Which of the following services can feed data to the MapReduce jobs? (Select TWO )
B. Amazon Kinesis
C. AWS Data Pipeline
D. Amazon Athena
E. Amazon ES
Correct Answer: BD


A Machine Learning Specialist prepared the following graph displaying the results of k-means fork = [1:10]lead4pass mls-c01 exam questions q11

Considering the graph, what is a reasonable selection for the optimal choice of k?
A. 1
B. 4
C. 7
D. 10
Correct Answer: C


A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals.
The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a
series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10
nodes The Specialist would like to get an output from the neural network that is a probability distribution of how likely it
is that the input image belongs to each of the 10 classes
Which function will produce the desired output?
A. Dropout
B. Smooth L1 loss
C. Softmax
D. Rectified linear units (ReLU)
Correct Answer: D
Reference: https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5


A manufacturing company asks its Machine Learning Specialist to develop a model that classifies defective parts into
one of eight defect types. The company has provided roughly 100000 images per defect type for training During the initial training of the image classification model the Specialist notices that the validation accuracy is 80%, while the
training accuracy is 90% It is known that human-level performance for this type of image classification is around 90%
What should the Specialist consider to fix this issue1?
A. A longer training time
B. Making the network larger
C. Using a different optimizer
D. Using some form of regularization
Correct Answer: D

