DP-203 Data Engineering on Microsoft Azure Questions and Answers

Questions 4

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

Options:

a table that has an IDENTITY property

a system-versioned temporal table

a user-defined SEQUENCE object

a table that has a FOREIGN KEY constraint

Buy Now

Questions 5

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 5

Options:

Buy Now

Questions 6

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

DP-203 Question 6

Options:

Buy Now

Questions 7

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 7

Options:

Buy Now

Questions 8

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 8

Options:

Buy Now

Questions 9

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

change feed

soft delete

time-based retention

lifecycle management

Buy Now

Questions 10

You have an enterprise data warehouse in Azure Synapse Analytics.

Using PolyBase, you create an external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2 without importing the data to the data warehouse.

The external table has three columns.

You discover that the Parquet files have a fourth column named ItemID.

Which command should you run to add the ItemID column to the external table?

DP-203 Question 10

Options:

Option A

Option B

Option C

Option D

Buy Now

Questions 11

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.

You plan to keep a record of changes to the available fields.

The supplier data contains the following columns.

DP-203 Question 11

Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

surrogate primary key

foreign key

effective start date

effective end date

last modified date

business key

Buy Now

Questions 12

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

DP-203 Question 12

Options:

Buy Now

Questions 13

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.

You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named container1.

You plan to insert data from the files in container1 into Table1 and transform the data. Each row of data in the files will produce one row in the serving layer of Table1.

You need to ensure that when the source data files are loaded to container1, the DateTime is stored as an additional column in Table1.

Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that has an additional DateTime column.

Does this meet the goal?

Options:

Yes

Buy Now

Questions 14

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Poo 11 and a storage account. The storage account contains a blob container. The blob container contains multiple CSV files.

You plan to load the files into Pool! by using the following code.

DP-203 Question 14

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

DP-203 Question 14

Options:

Buy Now

Questions 15

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL.

A workload for jobs that will run notebooks that use Python, Scala, and SOL.

A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.

All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Options:

Yes

Buy Now

Questions 16

You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a table named Customers. Customers will contain credit card information.

You need to recommend a solution to provide salespeople with the ability to view all the entries in Customers.

The solution must prevent all the salespeople from viewing or inferring the credit card information.

What should you include in the recommendation?

Options:

data masking

Always Encrypted

column-level security

row-level security

Buy Now

Questions 17

You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct values for a column named clusterID.

You monitor the Stream Analytics job and discover high latency.

You need to reduce the latency.

Which two actions should you perform? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

Options:

Add a pass-through query.

Add a temporal analytic function.

Scale out the query by using PARTITION BY.

Convert the query to a reference query.

Increase the number of streaming units.

Buy Now

Questions 18

You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a webpage.

The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has a type of either 'start' or 'end'.

You need to calculate the duration between start and end events.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 18

Options:

Buy Now

Questions 19

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

A workload for data engineers who will use Python and SQL.

A workload for jobs that will run notebooks that use Python, Scala, and SOL.

A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

The data engineers must share a cluster.

The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.

All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a Standard cluster for each data scientist, a Standard cluster for the data engineers, and a High Concurrency cluster for the jobs.

Does this meet the goal?

Options:

Yes

Buy Now

Questions 20

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.

You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only

be counted once.

Which windowing function should you use?

Options:

a five-minute Session window

a five-minute Sliding window

a five-minute Tumbling window

a five-minute Hopping window that has one-minute hop

Buy Now

Questions 21

You have an Azure subscription that contains an Azure Data Lake Storage Gen2 account named storage1. Storage1 contains a container named container1. Container1 contains a directory named directory1. Directory1 contains a file named file1.

You have an Azure Active Directory (Azure AD) user named User1 that is assigned the Storage Blob Data Reader role for storage1.

You need to ensure that User1 can append data to file1. The solution must use the principle of least privilege.

Which permissions should you grant? To answer, drag the appropriate permissions to the correct resources. Each permission may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

DP-203 Question 21

Options:

Buy Now

Questions 22

You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. The AllowedBlobpublicAccess porperty is disabled for storage1.

You need to create an external data source that can be used by Azure Active Directory (Azure AD) users to access storage1 from Pool1.

What should you create first?

Options:

an external resource pool

a remote service binding

database scoped credentials

an external library

Buy Now

Questions 23

You have an Azure subscription that contains the resources shown in the following table.

DP-203 Question 23

You need to ensure that you can Spark notebooks in ws1. The solution must ensure secrets from kv1 by using UAMI1. What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 23

Options:

Buy Now

Questions 24

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1. Which Apache Spark SQL operation should you use?

Options:

CREATE

UPDATE

MERGE

ALTER

Buy Now

Questions 25

You have an Azure Data Factory instance named ADF1 and two Azure Synapse Analytics workspaces named WS1 and WS2.

ADF1 contains the following pipelines:

P1: Uses a copy activity to copy data from a nonpartitioned table in a dedicated SQL pool of WS1 to an Azure Data Lake Storage Gen2 account

P2: Uses a copy activity to copy data from text-delimited files in an Azure Data Lake Storage Gen2 account to a nonpartitioned table in a dedicated SQL pool of WS2

You need to configure P1 and P2 to maximize parallelism and performance.

Which dataset settings should you configure for the copy activity if each pipeline? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 25

Options:

Buy Now

Questions 26

You plan to develop a dataset named Purchases by using Azure databricks Purchases will contain the following columns:

• ProductID

• ItemPrice

• lineTotal

• Quantity

• StorelD

• Minute

• Month

• Hour

• Year

• Day

You need to store the data to support hourly incremental load pipelines that will vary for each StoreID. the solution must minimize storage costs. How should you complete the rode? To answer, select the appropriate options In the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 26

Options:

Buy Now

Questions 27

You have an Azure Synapse Analystics dedicated SQL pool that contains a table named Contacts. Contacts contains a column named Phone.

You need to ensure that users in a specific role only see the last four digits of a phone number when querying the Phone column.

What should you include in the solution?

Options:

a default value

dynamic data masking

row-level security (RLS)

column encryption

table partitions

Buy Now

Questions 28

You have an Azure Databricks workspace and an Azure Data Lake Storage Gen2 account named storage!

New files are uploaded daily to storage1.

• Incrementally process new files as they are upkorage1 as a structured streaming source. The solution must meet the following requirements:

• Minimize implementation and maintenance effort.

• Minimize the cost of processing millions of files.

• Support schema inference and schema drift.

Which should you include in the recommendation?

Options:

Auto Loader

Apache Spark FileStreamSource

COPY INTO

Azure Data Factory

Buy Now

Questions 29

You are building a data flow in Azure Data Factory that upserts data into a table in an Azure Synapse Analytics dedicated SQL pool.

You need to add a transformation to the data flow. The transformation must specify logic indicating when a row from the input data must be upserted into the sink.

Which type of transformation should you add to the data flow?

Options:

join

select

surrogate key

alter row

Buy Now

Questions 30

You need to create an Azure Data Factory pipeline to process data for the following three departments at your company: Ecommerce, retail, and wholesale. The solution must ensure that data can also be processed for the entire company.

How should you complete the Data Factory data flow script? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

DP-203 Question 30

Options:

Buy Now

Questions 31

you have a project in Azure DevOps that contains a repository named Repo1. Repo1 contains a branch named main.

You create a new Azure Synapse workspace named Workspace1.

You need to create data processing pipelines in Workspace1. The solution must meet the following requirements:

• Pipeline artifacts must be stored in Repo1.

• Source control must be provided for pipeline artifacts.

• All development must be performed in a feature branch.

which four actions should you perform in sequence in Synapse Studio? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

DP-203 Question 31

Options:

Buy Now

Questions 32

You have an Azure subscription that contains an Azure Synapse Analytics serverless SQL pool. You run the following query in the pool.

DP-203 Question 32

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point.

DP-203 Question 32

Options:

Buy Now

Questions 33

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account

You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.

Which windowing function should you use?

Options:

a five-minute Tumbling window

a five-minute Sliding window

a five-minute Hopping window that has a one-minute hop

a five-minute Session window

Buy Now

Questions 34

You have an Azure Stream Analytics job that read data from an Azure event hub.

You need to evaluate whether the job processes data as quickly as the data arrives or cannot keep up.

Which metric should you review?

Options:

InputEventLastPunctuationTime

Input Sources Receive

Late input Events

Backlogged input Events

Buy Now

Questions 35

You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB. The data consumed from each source is shown in the following table.

DP-203 Question 35

You need to implement Azure Stream Analytics to calculate the average fare per mile by driver.

How should you configure the Stream Analytics input for each source? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 35

Options:

Buy Now

Questions 36

You have an Azure Data Lake Storage account that contains a staging zone.

You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes mapping data Flow, and then inserts the data info the data warehouse.

Does this meet the goal?

Options:

Yes

Buy Now

Questions 37

You have an Azure Data Lake Storage Gen2 container.

Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.

You need to design a data archiving solution that meets the following requirements:

New data is accessed frequently and must be available as quickly as possible.

Data that is older than five years is accessed infrequently but must be available within one second when requested.

Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.

Costs must be minimized while maintaining the required availability.

How should you manage the data? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

DP-203 Question 37

Options:

Buy Now

Questions 38

You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DateKey, ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.

DP-203 Question 38

Queries that use the data warehouse take a long time to complete.

You plan to migrate the solution to use Azure Synapse Analytics. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.

What should you recommend? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

DP-203 Question 38

Options:

Buy Now

Questions 39

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.

You have a table that was created by using the following Transact-SQL statement.

DP-203 Question 39

Which two columns should you add to the table? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

[EffectiveScarcDate] [datetime] NOT NULL,

[CurrentProduccCacegory] [nvarchar] (100) NOT NULL,

[EffectiveEndDace] [dacecime] NULL,

[ProductCategory] [nvarchar] (100) NOT NULL,

[OriginalProduccCacegory] [nvarchar] (100) NOT NULL,

Buy Now

Questions 40

You have an Azure Data Factory pipeline that is triggered hourly.

The pipeline has had 100% success for the past seven days.

The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error.

DP-203 Question 40

What is a possible cause of the error?

Options:

The parameter used to generate year=2021/month=01/day=10/hour=06 was incorrect.

From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.

From 06:00 to 07:00 on January 10, 2021, the file format of data in wwi/BIKES/CARBON was incorrect.

The pipeline was triggered too early.

Buy Now

Questions 41

You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container.

Which resource provider should you enable?

Options:

Microsoft.Sql

Microsoft-Automation

Microsoft.EventGrid

Microsoft.EventHub

Buy Now

Questions 42

You are designing an anomaly detection solution for streaming data from an Azure IoT hub. The solution must meet the following requirements:

Send the output to Azure Synapse.

Identify spikes and dips in time series data.

Minimize development and configuration effort.

Which should you include in the solution?

Options:

Azure Databricks

Azure Stream Analytics

Azure SQL Database

Buy Now

Questions 43

You have an Azure subscription that contains an Azure Synapse Analytics account. The account is integrated with an Azure Repos repository named Repo1 and contains a pipeline named Pipeline1. Repo1 contains the branches shown in the following table.

DP-203 Question 43

From featuredev, you develop and test changes to Pipeline1. You need to publish the changes. What should you do first?

Options:

From featuredev. create a pull request.

From main, create a pull request.

Add a Publish_config.json file to the root folder of the collaboration branch.

Switch to live mode.

Buy Now

Questions 44

You have an Azure Synapse Analytics dedicated SQL pool that hosts a database named DB1 You need to ensure that D81 meets the following security requirements:

• When credit card numbers show in applications, only the last four digits must be visible.

• Tax numbers must be visible only to specific users.

What should you use for each requirement? To answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.

DP-203 Question 44

Options:

Buy Now

Questions 45

You are developing an application that uses Azure Data Lake Storage Gen 2.

You need to recommend a solution to grant permissions to a specific application for a limited time period.

What should you include in the recommendation?

Options:

Azure Active Directory (Azure AD) identities

shared access signatures (SAS)

account keys

role assignments

Buy Now

Questions 46

You have an Azure subscription that contains a logical Microsoft SQL server named Server1. Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool1.

You need to recommend a Transparent Data Encryption (TDE) solution for Server1. The solution must meet the following requirements:

Track the usage of encryption keys.

Maintain the access of client apps to Pool1 in the event of an Azure datacenter outage that affects the availability of the encryption keys.

What should you include in the recommendation? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 46

Options:

Buy Now

Questions 47

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

Options:

a server-level virtual network rule

a database-level virtual network rule

a database-level firewall IP rule

a server-level firewall IP rule

Buy Now

Questions 48

What should you recommend using to secure sensitive customer contact information?

Options:

data labels

column-level security

row-level security

Transparent Data Encryption (TDE)

Buy Now

Questions 49

Which Azure Data Factory components should you recommend using together to import the daily inventory data from the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 49

Options:

Buy Now

Questions 50

What should you do to improve high availability of the real-time data processing solution?

Options:

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

Deploy a High Concurrency Databricks cluster.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

Set Data Lake Storage to use geo-redundant storage (GRS).

Buy Now

Exam Code: DP-203

Exam Name: Data Engineering on Microsoft Azure

Last Update: Dec 4, 2025

Questions: 361

PDF + Testing Engine

$52.5 ~~$174.99~~

Testing Engine

$40.5 ~~$134.99~~

PDF (Q&A)

$34.5 ~~$114.99~~

Big Cyber Monday Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dm70dm

dumpsmate logo

Contact Email:

Hot Vendors

DP-203 Data Engineering on Microsoft Azure Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation: