Summer Sale - Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dpm65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Databricks Certified Associate Developer for Apache Spark 3.5-Python Questions and Answers

Questions 4

What is the behavior for functiondate_sub(start, days)if a negative value is passed into thedaysparameter?

Options:

A.

The same start date will be returned

B.

An error message of an invalid parameter will be returned

C.

The number of days specified will be added to the start date

D.

The number of days specified will be removed from the start date

Buy Now
Questions 5

A data engineer wants to process a streaming DataFrame that receives sensor readings every second with columnssensor_id,temperature, andtimestamp. The engineer needs to calculate the average temperature for each sensor over the last 5 minutes while the data is streaming.

Which code implementation achieves the requirement?

Options from the images provided:

A)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 5

B)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 5

C)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 5

D)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 5

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 6

A Data Analyst is working on the DataFramesensor_df, which contains two columns:

Which code fragment returns a DataFrame that splits therecordcolumn into separate columns and has one array item per row?

A)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

B)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

C)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

D)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

Options:

A.

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "sensor_id", "status", "health")

B.

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

C.

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

D.

exploded_df = exploded_df.select("record_datetime", "record_exploded")

Buy Now
Questions 7

What is the benefit of using Pandas on Spark for data transformations?

Options:

Options:

A.

It is available only with Python, thereby reducing the learning curve.

B.

It computes results immediately using eager execution, making it simple to use.

C.

It runs on a single node only, utilizing the memory with memory-bound DataFrames and hence cost-efficient.

D.

It executes queries faster using all the available cores in the cluster as well as provides Pandas’s rich set of features.

Buy Now
Questions 8

Which configuration can be enabled to optimize the conversion between Pandas and PySpark DataFrames using Apache Arrow?

Options:

A.

spark.conf.set("spark.pandas.arrow.enabled", "true")

B.

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

C.

spark.conf.set("spark.sql.execution.arrow.enabled", "true")

D.

spark.conf.set("spark.sql.arrow.pandas.enabled", "true")

Buy Now
Questions 9

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

A.

df.write.mode("overwrite").json("path/to/file")

B.

df.write.overwrite.json("path/to/file")

C.

df.write.json("path/to/file", overwrite=True)

D.

df.write.format("json").save("path/to/file", mode="overwrite")

Buy Now
Questions 10

The following code fragment results in an error:

@F.udf(T.IntegerType())

def simple_udf(t: str) -> str:

return answer * 3.14159

Which code fragment should be used instead?

Options:

A.

@F.udf(T.IntegerType())

def simple_udf(t: int) -> int:

return t * 3.14159

B.

@F.udf(T.DoubleType())

def simple_udf(t: float) -> float:

return t * 3.14159

C.

@F.udf(T.DoubleType())

def simple_udf(t: int) -> int:

return t * 3.14159

D.

@F.udf(T.IntegerType())

def simple_udf(t: float) -> float:

return t * 3.14159

Buy Now
Questions 11

What is the benefit of Adaptive Query Execution (AQE)?

Options:

A.

It allows Spark to optimize the query plan before execution but does not adapt during runtime.

B.

It enables the adjustment of the query plan during runtime, handling skewed data, optimizing join strategies, and improving overall query performance.

C.

It optimizes query execution by parallelizing tasks and does not adjust strategies based on runtime metrics like data skew.

D.

It automatically distributes tasks across nodes in the clusters and does not perform runtime adjustments to the query plan.

Buy Now
Questions 12

An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.

The initial code is:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 12

def in_spanish_inner(df: pd.Series) -> pd.Series:

model = get_translation_model(target_lang='es')

return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner, StringType())

How can the MLOps engineer change this code to reduce how many times the language model is loaded?

Options:

A.

Convert the Pandas UDF to a PySpark UDF

B.

Convert the Pandas UDF from a Series → Series UDF to a Series → Scalar UDF

C.

Run thein_spanish_inner()function in amapInPandas()function call

D.

Convert the Pandas UDF from a Series → Series UDF to an Iterator[Series] → Iterator[Series] UDF

Buy Now
Questions 13

Given a CSV file with the content:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

And the following code:

from pyspark.sql.types import *

schema = StructType([

StructField("name", StringType()),

StructField("age", IntegerType())

])

spark.read.schema(schema).csv(path).collect()

What is the resulting output?

Options:

A.

[Row(name='bambi'), Row(name='alladin', age=20)]

B.

[Row(name='alladin', age=20)]

C.

[Row(name='bambi', age=None), Row(name='alladin', age=20)]

D.

The code throws an error due to a schema mismatch.

Buy Now
Questions 14

The following code fragment results in an error:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

Which code fragment should be used instead?

A)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

B)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

C)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

D)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

Options:

Buy Now
Questions 15

What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?

Options:

A.

Bothcache()andpersist()can be used to set the default storage level (MEMORY_AND_DISK_SER)

B.

Both functions perform the same operation. Thepersist()function provides improved performance asits default storage level isDISK_ONLY.

C.

persist()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK_SER) andcache()- Can be used to set different storage levels to persist the contents of the DataFrame.

D.

cache()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK) andpersist()- Can be used to set different storage levels to persist the contents of the DataFrame

Buy Now
Questions 16

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

A.

Add more executor instances to the cluster

B.

Increase the driver memory on the client machine

C.

Switch the deployment mode to cluster mode

D.

Switch the deployment mode to local mode

Buy Now
Questions 17

A data engineer is streaming data from Kafka and requires:

Minimal latency

Exactly-once processing guarantees

Which trigger mode should be used?

Options:

A.

.trigger(processingTime='1 second')

B.

.trigger(continuous=True)

C.

.trigger(continuous='1 second')

D.

.trigger(availableNow=True)

Buy Now
Questions 18

A developer initializes a SparkSession:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 18

spark = SparkSession.builder \

.appName("Analytics Application") \

.getOrCreate()

Which statement describes thesparkSparkSession?

Options:

A.

ThegetOrCreate()method explicitly destroys any existing SparkSession and creates a new one.

B.

A SparkSession is unique for eachappName, and callinggetOrCreate()with the same name will return an existing SparkSession once it has been created.

C.

If a SparkSession already exists, this code will return the existing session instead of creating a new one.

D.

A new SparkSession is created every time thegetOrCreate()method is invoked.

Buy Now
Questions 19

A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set forspark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold.

Which type of join will Adaptive Query Execution (AQE) choose in this case?

Options:

A.

A Cartesian join

B.

A shuffled hash join

C.

A broadcast nested loop join

D.

A sort-merge join

Buy Now
Questions 20

Given this view definition:

df.createOrReplaceTempView("users_vw")

Which approach can be used to query the users_vw view after the session is terminated?

Options:

Options:

A.

Query the users_vw using Spark

B.

Persist the users_vw data as a table

C.

Recreate the users_vw and query the data using Spark

D.

Save the users_vw definition and query using Spark

Buy Now
Questions 21

A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:

Low number of Active Tasks

Many tasks complete in milliseconds

Fewer tasks than available CPUs

Which approach should be used to adjust the partitioning for optimal resource allocation?

Options:

A.

Set the number of partitions equal to the total number of CPUs in the cluster

B.

Set the number of partitions to a fixed value, such as 200

C.

Set the number of partitions equal to the number of nodes in the cluster

D.

Set the number of partitions by dividing the dataset size (1 TB) by a reasonable partition size, such as 128 MB

Buy Now
Questions 22

What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?

Options:

A.

The conversion will automatically distribute the data across worker nodes

B.

The operation will fail if the Pandas DataFrame exceeds 1000 rows

C.

Data will be lost during conversion

D.

The operation will load all data into the driver's memory, potentially causing memory overflow

Buy Now
Questions 23

An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.

How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?

Options:

A.

Locate the executor logs on the Spark master node, typically under the/tmpdirectory.

B.

Use the commandspark-submitwith the—verboseflag to print the logs to the console.

C.

Use the Spark UI to select the stage and view the executor logs directly from the stages tab.

D.

Fetch the logs by running a Spark job with thespark-sqlCLI tool.

Buy Now
Questions 24

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.The DataFrame has a column named email with customer email addresses. The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

A.

customerDF.select(

col("email").substr(0, 5).alias("username"),

col("email").substr(-5).alias("domain")

)

B.

customerDF.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

C.

customerDF.withColumn("username", substring_index(col("email"), "@", 1)) \

.withColumn("domain", substring_index(col("email"), "@", -1))

D.

customerDF.select(

regexp_replace(col("email"), "@", "").alias("username"),

regexp_replace(col("email"), "@", "").alias("domain")

)

Buy Now
Questions 25

An engineer wants to join two DataFramesdf1anddf2on the respectiveemployee_idandemp_idcolumns:

df1:employee_id INT,name STRING

df2:emp_id INT,department STRING

The engineer uses:

result = df1.join(df2, df1.employee_id == df2.emp_id, how='inner')

What is the behaviour of the code snippet?

Options:

A.

The code fails to execute because the column names employee_id and emp_id do not match automatically

B.

The code fails to execute because it must use on='employee_id' to specify the join column explicitly

C.

The code fails to execute because PySpark does not support joining DataFrames with a different structure

D.

The code works as expected because the join condition explicitly matches employee_id from df1 with emp_id from df2

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5-Python
Last Update: Jun 1, 2025
Questions: 85

PDF + Testing Engine

$57.75  $164.99

Testing Engine

$43.75  $124.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 testing engine

PDF (Q&A)

$36.75  $104.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf
dumpsmate guaranteed to pass
24/7 Customer Support

DumpsMate's team of experts is always available to respond your queries on exam preparation. Get professional answers on any topic of the certification syllabus. Our experts will thoroughly satisfy you.

Site Secure

mcafee secure

TESTED 02 Jun 2025