MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

Options:

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Buy Now

Questions 5

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

Options:

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Buy Now

Questions 6

An ML engineer is setting up a CI/CD pipeline for an ML workflow in Amazon SageMaker AI. The pipeline must automatically retrain, test, and deploy a model whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer also needs to track model versions for auditing.

Which solution will meet these requirements?

Options:

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and track model versions.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

Use AWS Lambda and Amazon EventBridge to retrain and deploy the model and track versions via logs.

Manually retrain and deploy the model using SageMaker notebook instances and track versions with AWS CloudTrail.

Buy Now

Questions 7

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

Options:

Perform ordinal encoding to represent categories of the feature.

Perform similarity encoding to represent categories of the feature.

Perform one-hot encoding to represent categories of the feature.

Perform target encoding to represent categories of the feature.

Buy Now

Questions 8

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

Buy Now

Questions 9

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker AI notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now

Questions 10

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model's performance?

Options:

Accuracy

Area Under the ROC Curve (AUC)

F1 score

Mean absolute error (MAE)

Buy Now

Questions 11

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Options:

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Buy Now

Questions 12

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

Optimize for accuracy. Use image augmentation on the less common images.

Optimize for F1 score. Use image augmentation on the less common images.

Optimize for accuracy. Use SMOTE to generate synthetic images.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Buy Now

Questions 13

A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.

Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

Options:

Use Amazon Mechanical Turk jobs to detect duplicates.

Use Amazon QuickSight ML Insights to build a custom deduplication model.

Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

Use the AWS Glue FindMatches transform to detect duplicates.

Buy Now

Questions 14

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.

Which solution will meet this requirement with the LEAST operational effort?

Options:

Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

Use AWS Glue DataBrew built-in features to oversample the minority class.

Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Buy Now

Questions 15

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Buy Now

Questions 16

A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.

The company needs to use the dataset in a solution to determine if a model can predict the target variable.

Which solution will provide this information with the LEAST development effort?

Options:

Create a new model by using Amazon SageMaker Autopilot. Report the model's achieved performance.

Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

Configure Amazon Macie to analyze the dataset and to create a model. Report the model's achieved performance.

Select a model from Amazon Bedrock. Tune the model with the data. Report the model's achieved performance.

Buy Now

Questions 17

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

Run the primary node, core nodes, and task nodes on On-Demand Instances.

Run the primary node, core nodes, and task nodes on Spot Instances.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Buy Now

Questions 18

A healthcare company wants to detect irregularities in patient vital signs that could indicate early signs of a medical condition. The company has an unlabeled dataset that includes patient health records, medication history, and lifestyle changes.

Which algorithm and hyperparameter should the company use to meet this requirement?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to greater than 100 to regulate tree complexity.

Use the Amazon SageMaker AI k-means clustering algorithm. Set k to determine the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to the number of training iterations.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set num_trees to greater than 100.

Buy Now

Questions 19

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

AWS::SageMaker::Model

AWS::SageMaker::Endpoint

AWS::SageMaker::NotebookInstance

AWS::SageMaker::Pipeline

Buy Now

Questions 20

A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML

engineer needs to prepare and store the data so that the company can use the data to train ML models.

Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)

• Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.

• Store the resulting data back in Amazon S3.

• Use Amazon Athena to infer the schemas and available columns.

• Use AWS Glue crawlers to infer the schemas and available columns.

• Use AWS Glue DataBrew for data cleaning and feature engineering.

MLA-C01 Question 20

Options:

Buy Now

Answer:

Explanation:

Step 1: Use AWS Glue crawlers to infer the schemas and available columns.

Step 2: Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 3: Store the resulting data back in Amazon S3.

Step 1: Use AWS Glue Crawlers to Infer Schemas and Available Columns

Why? The data is stored in .csv files with unlabeled columns, and Glue Crawlers can scan the raw data in Amazon S3 to automatically infer the schema, including available columns, data types, and any missing or incomplete entries.

How? Configure AWS Glue Crawlers to point to the S3 bucket containing the .csv files, and run the crawler to extract metadata. The crawler creates a schema in the AWS Glue Data Catalog, which can then be used for subsequent transformations.

Step 2: Use AWS Glue DataBrew for Data Cleaning and Feature Engineering

Why? Glue DataBrew is a visual data preparation tool that allows for comprehensive cleaning and transformation of data. It supports imputation of missing values, renaming columns, feature engineering, and more without requiring extensive coding.

How? Use Glue DataBrew to connect to the inferred schema from Step 1 and perform data cleaning and feature engineering tasks like filling in missing rows/columns, renaming unlabeled columns, and creating derived features.

Step 3: Store the Resulting Data Back in Amazon S3

Why? After cleaning and preparing the data, it needs to be saved back to Amazon S3 so that it can be used for training machine learning models.

How? Configure Glue DataBrew to export the cleaned data to a specific S3 bucket location. This ensures the processed data is readily accessible for ML workflows.

Order Summary:

Use AWS Glue crawlers to infer schemas and available columns.

Use AWS Glue DataBrew for data cleaning and feature engineering.

Store the resulting data back in Amazon S3.

This workflow ensures that the data is prepared efficiently for ML model training while leveraging AWS services for automation and scalability.

Questions 21

An ML engineer wants to deploy an Amazon SageMaker AI model for inference. The payload sizes are less than 3 MB. Processing time does not exceed 45 seconds. The traffic patterns will be irregular or unpredictable.

Which inference option will meet these requirements MOST cost-effectively?

Options:

Asynchronous inference

Real-time inference

Serverless inference

Batch transform

Buy Now

Questions 22

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model's accuracy in the LEAST amount of time?

Options:

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Buy Now

Questions 23

An ML engineer uses one ML framework to train multiple ML models. The ML engineer needs to optimize inference costs and host the models on Amazon SageMaker AI.

Which solution will meet these requirements MOST cost-effectively?

Options:

Create a multi-container inference endpoint for direct invocation.

Create a multi-model inference endpoint for all the models.

Create a multi-container inference endpoint for sequential invocation.

Create multiple single-model inference endpoints for each model.

Buy Now

Questions 24

An ML engineer is designing an AI-powered traffic management system. The system must use near real-time inference to predict congestion and prevent collisions.

The system must also use batch processing to perform historical analysis of predictions over several hours to improve the model. The inference endpoints must scale automatically to meet demand.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Use Amazon SageMaker real-time inference endpoints with automatic scaling based on ConcurrentInvocationsPerInstance.

Use AWS Lambda with reserved concurrency and SnapStart to connect to SageMaker endpoints.

Use an Amazon SageMaker Processing job for batch historical analysis. Schedule the job with Amazon EventBridge.

Use Amazon EC2 Auto Scaling to host containers for batch analysis.

Use AWS Lambda for historical analysis.

Buy Now

Questions 25

A company has an existing Amazon SageMaker AI model (v1) on a production endpoint. The company develops a new model version (v2) and needs to test v2 in production before substituting v2 for v1.

The company needs to minimize the risk of v2 generating incorrect output in production and must prevent any disruption of production traffic during the change.

Which solution will meet these requirements?

Options:

Create a second production variant for v2. Assign 1% of the traffic to v2 and 99% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Create a second production variant for v2. Assign 10% of the traffic to v2 and 90% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Deploy v2 to a new endpoint. Turn on data capture for the production endpoint. Send 100% of the input data to v2.

Deploy v2 into a shadow variant that samples 100% of the inference requests. Collect all output in Amazon S3. If v2 performs as expected, promote v2 to production.

Buy Now

Questions 26

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Buy Now

Questions 27

An ML engineer is configuring auto scaling for an inference component of a model that runs behind an Amazon SageMaker AI endpoint. The ML engineer configures SageMaker AI auto scaling with a target tracking scaling policy set to 100 invocations per model per minute. The SageMaker AI endpoint scales appropriately during normal business hours. However, the ML engineer notices that at the start of each business day, there are zero instances available to handle requests, which causes delays in processing.

The ML engineer must ensure that the SageMaker AI endpoint can handle incoming requests at the start of each business day.

Which solution will meet this requirement?

Options:

Reduce the SageMaker AI auto scaling cooldown period to the minimum supported value. Add an auto scaling lifecycle hook to scale the SageMaker AI instances.

Change the target metric to CPU utilization.

Modify the scaling policy target value to one.

Apply a step scaling policy that scales based on an Amazon CloudWatch alarm. Apply a second CloudWatch alarm and scaling policy to scale the minimum number of instances from zero to one at the start of each business day.

Buy Now

Questions 28

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

MLA-C01 Question 28

Options:

Buy Now

Questions 29

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST development effort?

Options:

Use SageMaker AI built-in algorithms to train the proprietary datasets.

Use SageMaker AI script mode and premade images for ML frameworks.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

Purchase similar production models through AWS Marketplace.

Buy Now

Questions 30

Which solution will meet these requirements?

Options:

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now

Questions 31

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

CSV files compressed with Snappy

JSON objects in JSONL format

JSON files compressed with gzip

Apache Parquet files

Buy Now

Questions 32

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

Options:

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Buy Now

Questions 33

An ML engineer is collecting data to train a classification ML model by using Amazon SageMaker AI. The target column can have two possible values: Class A or Class B. The ML engineer wants to ensure that the number of samples for both Class A and Class B are balanced, without losing any existing training data. The ML engineer must test the balance of the training data.

Which solution will meet this requirement?

Options:

Use SageMaker Clarify to check for class imbalance (CI). If the value is equal to 0, then use random undersampling in SageMaker Data Wrangler to balance the classes.

Use SageMaker Clarify to check for class imbalance (CI). If the value is greater than 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Data Wrangler to balance the classes.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is greater than 0, then use random undersampling in SageMaker Studio to balance the classes.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is equal to 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Studio to balance the classes.

Buy Now

Questions 34

A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker AI endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model's predictions?

Options:

Use SageMaker Model Monitor on the deployed model.

Use SageMaker Clarify on the deployed model.

Show the distribution of inferences from A/B testing in Amazon CloudWatch.

Add a shadow endpoint. Analyze prediction differences on samples.

Buy Now

Questions 35

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer's AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

Options:

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Buy Now

Questions 36

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

Amazon SageMaker Pipelines

Amazon Managed Workflows for Apache Airflow (MWAA)

AWS Lambda

Apache Spark

Buy Now

Questions 37

An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate.

During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions.

What should the ML engineer do to improve the fraud detection for new transactions?

Options:

Increase the learning rate.

Remove some irrelevant features from the training dataset.

Increase the value of the max_depth hyperparameter.

Decrease the value of the max_depth hyperparameter.

Buy Now

Questions 38

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

Use a multi-model serverless endpoint. Enable caching.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Buy Now

Questions 39

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company's internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

Use Amazon Kendra Experience Builder.

Use Amazon Aurora PostgreSQL with the pgvector extension.

Use Amazon RDS for PostgreSQL with the pgvector extension.

Use the AWS Glue Data Catalog metadata repository.

Buy Now

Questions 40

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Options:

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

Randomly oversample the majority class in the validation dataset.

Apply k-means clustering to undersample the minority class in the test dataset.

Buy Now

Questions 41

A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.

Which solution will set up the required online validation with the LEAST operational overhead?

Options:

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Buy Now

Answer:

Explanation:

Scenario: The company wants to perform online validation of a new ML model on 10% of the traffic before fully deploying the model in production. The setup must have minimal operational overhead.

Why Use SageMaker Production Variants?

Built-In Traffic Splitting: Amazon SageMaker endpoints support production variants, allowing multiple models to run on a single endpoint. You can direct a percentage of incoming traffic to each variant by adjusting the variant weights.

Ease of Management: Using production variants eliminates the need for additional infrastructure like separate endpoints or custom ALB configurations.

Monitoring with CloudWatch: SageMaker automatically integrates with CloudWatch, enabling real-time monitoring of model performance and invocation metrics.

Steps to Implement:

Deploy the New Model as a Production Variant:

Update the existing SageMaker endpoint to include the new model as a production variant. This can be done via the SageMaker console, CLI, or SDK.

Example SDK Code:

import boto3

sm_client = boto3.client('sagemaker')

response = sm_client.update_endpoint_weights_and_capacities(

EndpointName='existing-endpoint-name',

DesiredWeightsAndCapacities=[

{'VariantName': 'current-model', 'DesiredWeight': 0.9},

{'VariantName': 'new-model', 'DesiredWeight': 0.1}

]

)

Set the Variant Weight:

Assign a weight of 0.1 to the new model and 0.9 to the existing model. This ensures 10% of traffic goes to the new model while the remaining 90% continues to use the current model.

Monitor the Performance:

Use Amazon CloudWatch metrics, such as InvocationCount and ModelLatency, to monitor the traffic and performance of each variant.

Validate the Results:

Analyze the performance of the new model based on metrics like accuracy, latency, and failure rates.

Why Not the Other Options?

Option B: Setting the weight to 1 directs all traffic to the new model, which does not meet the requirement of splitting traffic for validation.

Option C: Creating a new endpoint introduces additional operational overhead for traffic routing and monitoring, which is unnecessary given SageMaker's built-in production variant capability.

Option D: Configuring the ALB to route traffic requires manual setup and lacks SageMaker's seamless variant monitoring and traffic splitting features.

Conclusion:

Using production variants with a weight of 0.1 for the new model on the existing SageMaker endpoint provides the required traffic split for online validation with minimal operational overhead.

[References:, Amazon SageMaker Endpoints, SageMaker Production Variants, Monitoring SageMaker Endpoints with CloudWatch, , , ]

Questions 42

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

Options:

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Buy Now

Questions 43

A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset.

Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

Options:

Hyperbaric!

Grid search

Bayesian optimization

Random search

Buy Now

Questions 44

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

Options:

Configure the competitor's name as a blocked phrase in Amazon Q Business.

Configure an Amazon Q Business retriever to exclude the competitor's name.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Buy Now

Questions 45

A company has a large collection of chat recordings from customer interactions after a product release. An ML engineer needs to create an ML model to analyze the chat data. The ML engineer needs to determine the success of the product by reviewing customer sentiments about the product.

Which action should the ML engineer take to complete the evaluation in the LEAST amount of time?

Options:

Use Amazon Rekognition to analyze sentiments of the chat conversations.

Train a Naive Bayes classifier to analyze sentiments of the chat conversations.

Use Amazon Comprehend to analyze sentiments of the chat conversations.

Use random forests to classify sentiments of the chat conversations.

Buy Now

Questions 46

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

Options:

Use SageMaker AI file mode to load and process the images in batches.

Reduce the batch size of the model and increase the number of pre-processing threads.

Reduce the quality of the training images in the S3 bucket.

Convert the images into RecordIO format and use the lazy loading pattern.

Buy Now

Questions 47

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

Define the average F1 score in the TrainingInputMode parameter.

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

Publish the average F1 score as a custom Amazon CloudWatch metric.

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Buy Now

Questions 48

Case study

After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.

Which solution will meet these requirements?

Options:

Use Amazon Athena to automatically detect the anomalies and to visualize the result.

Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Buy Now

Answer:

Explanation:

Amazon SageMaker Data Wrangler is a comprehensive tool that streamlines the process of data preparation and offers built-in capabilities for anomaly detection and visualization.

Key Features of SageMaker Data Wrangler:

Data Importation: Connects seamlessly to various data sources, including Amazon S3 and on-premises databases, facilitating the aggregation of transaction logs, customer profiles, and MySQL tables.

Anomaly Detection: Provides built-in analyses to detect anomalies in time series data, enabling the identification of outliers that may indicate fraudulent activities.

Visualization: Offers a suite of visualization tools, such as histograms and scatter plots, to help understand data distributions and relationships, which are crucial for feature engineering and model development.

Implementation Steps:

Data Aggregation:

Import data from Amazon S3 and on-premises MySQL databases into SageMaker Data Wrangler.

Utilize Data Wrangler's data flow interface to combine and preprocess datasets, ensuring a unified dataset for analysis.

Anomaly Detection:

Apply the anomaly detection analysis feature to identify outliers in the dataset.

Configure parameters such as the anomaly threshold to fine-tune the detection sensitivity.

Visualization:

Use built-in visualization tools to create charts and graphs that depict data distributions and highlight anomalies.

Interpret these visualizations to gain insights into potential fraud patterns and feature interdependencies.

Advantages of Using SageMaker Data Wrangler:

Integrated Workflow: Combines data preparation, anomaly detection, and visualization within a single interface, streamlining the ML development process.

Operational Efficiency: Reduces the need for multiple tools and complex integrations, thereby minimizing operational overhead.

Scalability: Handles large datasets efficiently, making it suitable for extensive transaction logs and customer profiles.

By leveraging SageMaker Data Wrangler, the ML engineer can effectively detect anomalies and visualize results, facilitating the development of a robust fraud detection model.

Analyze and Visualize - Amazon SageMaker

Transform Data - Amazon SageMaker

Questions 49

A company runs an ML model on Amazon SageMaker AI. The company uses an automatic process that makes API calls to create training jobs for the model. The company has new compliance rules that prohibit the collection of aggregated metadata from training jobs.

Which solution will prevent SageMaker AI from collecting metadata from the training jobs?

Options:

Opt out of metadata tracking for any training job that is submitted.

Ensure that training jobs are running in a private subnet in a custom VPC.

Encrypt the training data with an AWS Key Management Service (AWS KMS) customer managed key.

Reconfigure the training jobs to use only AWS Nitro instances.

Buy Now

Questions 50

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

Options:

Purchase Reserved Instances with partial upfront payment.

Purchase On-Demand Instances.

Purchase SageMaker AI Savings Plans.

Purchase Spot Instances that use automated checkpoints.

Buy Now

Questions 51

A company's dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Buy Now

Questions 52

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions.

Which solution will meet this requirement?

Options:

Apply statistics from a well-known dataset to normalize the production samples.

Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples.

Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples.

Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples.

Buy Now

Questions 53

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Buy Now

Questions 54

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

Options:

Lack of training data

Drift in production data distribution

Compute resource constraints

Model overfitting

Buy Now

Questions 55

Case study

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

Options:

LightGBM

Linear learner

К-means clustering

Neural Topic Model (NTM)

Buy Now

Questions 56

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

Options:

Increase the temperature parameter and the top_k parameter.

Increase the temperature parameter. Decrease the top_k parameter.

Decrease the temperature parameter. Increase the top_k parameter.

Decrease the temperature parameter and the top_k parameter.

Buy Now

Questions 57

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

MLA-C01 Question 57

Options:

Buy Now

Questions 58

An ML engineer is developing a neural network to run on new user data. The dataset has dozens of floating-point features. The dataset is stored as CSV objects in an Amazon S3 bucket. Most objects and columns are missing at least one value. All features are relatively uniform except for a small number of extreme outliers. The ML engineer wants to use Amazon SageMaker Data Wrangler to handle missing values before passing the dataset to the neural network.

Which solution will provide the MOST complete data?

Options:

Drop samples that are missing values.

Impute missing values with the mean value.

Impute missing values with the median value.

Drop columns that are missing values.

Buy Now

Questions 59

An ML engineer wants to re-train an XGBoost model at the end of each month. A data team prepares the training data. The training dataset is a few hundred megabytes in size. When the data is ready, the data team stores the data as a new file in an Amazon S3 bucket.

The ML engineer needs a solution to automate this pipeline. The solution must register the new model version in Amazon SageMaker Model Registry within 24 hours.

Which solution will meet these requirements?

Options:

Create an AWS Lambda function that runs one time each week to poll the S3 bucket for new files. Invoke the Lambda function asynchronously. Configure the Lambda function to start the pipeline if the function detects new data.

Create an Amazon CloudWatch rule that runs on a schedule to start the pipeline every 30 days.

Create an S3 Lifecycle rule to start the pipeline every time a new object is uploaded to the S3 bucket.

Create an Amazon EventBridge rule to start an AWS Step Functions TrainingStep every time a new object is uploaded to the S3 bucket.

Buy Now

Questions 60

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Buy Now

Questions 61

An ML engineer is training a simple neural network model. The model’s performance improves initially and then degrades after a certain number of epochs.

Which solutions will mitigate this problem? (Select TWO.)

Options:

Enable early stopping on the model.

Increase dropout in the layers.

Increase the number of layers.

Increase the number of neurons.

Investigate and reduce the sources of model bias.

Buy Now

Questions 62

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

Options:

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

Use an LLM from Amazon Bedrock with zero-shot learning.

Use the Amazon Comprehend DetectEntities API.

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Buy Now

Answer:

Explanation:

The key requirement in this scenario is detecting customer intent without having any training data. According to AWS Machine Learning and Generative AI documentation, zero-shot learning is specifically designed for situations where labeled training data is unavailable. Zero-shot learning allows a pre-trained large language model (LLM) to perform tasks it has not been explicitly trained on by leveraging its general knowledge and language understanding.

Amazon Bedrock provides fully managed access to foundation models (FMs) and LLMs that support zero-shot and few-shot learning. By using an LLM from Amazon Bedrock, the company can directly infer customer intent from natural language inputs without building, training, or fine-tuning a custom model. This approach is ideal for conversational interfaces where rapid deployment and scalability are required.

Option A is incorrect because fine-tuning a sequence-to-sequence (seq2seq) model in Amazon SageMaker JumpStart still requires labeled training data. Since the company explicitly does not have training data, this option does not meet the requirement.

Option C is also incorrect because the Amazon Comprehend DetectEntities API is designed for named entity recognition (NER), such as detecting names, dates, locations, or monetary values. It does not perform intent detection and is not suitable for conversational AI intent classification.

Option D is partially misleading. While it is technically possible to run an LLM on Amazon EC2, this does not inherently solve the problem of intent detection without training data. Additionally, Amazon Bedrock already abstracts infrastructure management, scaling, and model hosting, making direct EC2 deployment unnecessary and less efficient.

Therefore, using an LLM from Amazon Bedrock with zero-shot learning is the most appropriate, scalable, and AWS-recommended solution for intent detection without training data.

Exam Code: MLA-C01

Exam Name: AWS Certified Machine Learning Engineer - Associate

Last Update: Feb 20, 2026

Questions: 207

PDF + Testing Engine

$49.5 ~~$164.99~~

Testing Engine

$37.5 ~~$124.99~~

PDF (Q&A)

$31.5 ~~$104.99~~

Spring Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dm70dm

dumpsmate logo

Contact Email:

Hot Vendors

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: