Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Records that violate the expectation cause the job to fail.

Buy Now

Questions 5

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

They can schedule the query to run every 1 day from the Jobs UI.

They can schedule the query to run every 12 hours from the Jobs UI.

Buy Now

Questions 6

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

None of these

Data lake

Data warehouse

All of these

Data lakehouse

Buy Now

Questions 7

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

Options:

CONSTRAINT valid_location EXPECT (location = NULL)

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Buy Now

Questions 8

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation cause the job to fail.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Buy Now

Questions 9

A data architect has determined that a table of the following format is necessary:

Databricks-Certified-Data-Engineer-Associate Question 9

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Databricks-Certified-Data-Engineer-Associate Question 9

Options:

Option A

Option B

Option C

Option D

Option E

Buy Now

Questions 10

Which of the following is stored in the Databricks customer's cloud account?

Options:

Databricks web application

Cluster management metadata

Repos

Data

Notebooks

Buy Now

Questions 11

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

Options:

dbfs:/user/hive/database/customer360

dbfs:/user/hive/warehouse

dbfs:/user/hive/customer360

More information is needed to determine the correct response

Buy Now

Questions 12

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Buy Now

Questions 13

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Databricks-Certified-Data-Engineer-Associate Question 13

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Buy Now

Questions 14

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

Options:

An automated report needs to be refreshed as quickly as possible.

An automated report needs to be made reproducible.

An automated report needs to be tested to identify errors.

An automated report needs to be version-controlled across multiple collaborators.

An automated report needs to be runnable by all stakeholders.

Buy Now

Questions 15

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

DROP

IGNORE

MERGE

APPEND

INSERT

Buy Now

Questions 16

Which of the following commands will return the location of database customer360?

Options:

DESCRIBE LOCATION customer360;

DROP DATABASE customer360;

DESCRIBE DATABASE customer360;

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

USE DATABASE customer360;

Buy Now

Questions 17

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

pyspark.sql.types.DateType

datetime

pyspark.sql.types.TimestampType

Cron syntax

There is no way to represent and submit this information programmatically

Buy Now

Questions 18

Which two components function in the DB platform architecture’s control plane? (Choose two.)

Options:

Virtual Machines

Compute Orchestration

Serverless Compute

Compute

Unity Catalog

Buy Now

Questions 19

Which of the following commands will return the number of null values in the member_id column?

Options:

SELECT count(member_id) FROM my_table;

SELECT count(member_id) - count_null(member_id) FROM my_table;

SELECT count_if(member_id IS NULL) FROM my_table;

SELECT null(member_id) FROM my_table;

SELECT count_null(member_id) FROM my_table;

Buy Now

Questions 20

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Databricks-Certified-Data-Engineer-Associate Question 20

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

Replace predict with a stream-friendly prediction function

Replace schema(schema) with option ("maxFilesPerTrigger", 1)

Replace "transactions" with the path to the location of the Delta table

Replace format("delta") with format("stream")

Replace spark.read with spark.readStream

Buy Now

Questions 21

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

Unity Catalog

Data Explorer

Delta Lake

Delta Live Tables

Auto Loader

Buy Now

Questions 22

Which query is performing a streaming hop from raw data to a Bronze table?

Databricks-Certified-Data-Engineer-Associate Question 22

Options:

Option A

Option B

Option C

Option D

Buy Now

Questions 23

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

Review the Permissions tab in the table's page in Data Explorer

All of these options can be used to identify the owner of the table

Review the Owner field in the table's page in Data Explorer

Review the Owner field in the table's page in the cloud storage solution

There is no way to identify the owner of the table

Buy Now

Questions 24

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

Options:

Databricks-Certified-Data-Engineer-Associate Question 24 Option 1

Buy Now

Questions 25

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?