Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Databricks Certified Associate Developer for Apache Spark 3.5-Python Questions and Answers

Questions 4

What is the behavior for functiondate_sub(start, days)if a negative value is passed into thedaysparameter?

Options:

The same start date will be returned

An error message of an invalid parameter will be returned

The number of days specified will be added to the start date

The number of days specified will be removed from the start date

Buy Now

Questions 5

A data engineer wants to process a streaming DataFrame that receives sensor readings every second with columnssensor_id,temperature, andtimestamp. The engineer needs to calculate the average temperature for each sensor over the last 5 minutes while the data is streaming.

Which code implementation achieves the requirement?

Options from the images provided:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 5

Options:

Option A

Option B

Option C

Option D

Buy Now

Questions 6

A Data Analyst is working on the DataFramesensor_df, which contains two columns:

Which code fragment returns a DataFrame that splits therecordcolumn into separate columns and has one array item per row?

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 6

Options:

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "sensor_id", "status", "health")

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "record_exploded")

Buy Now

Questions 7

What is the benefit of using Pandas on Spark for data transformations?

Options:

It is available only with Python, thereby reducing the learning curve.

It computes results immediately using eager execution, making it simple to use.

It runs on a single node only, utilizing the memory with memory-bound DataFrames and hence cost-efficient.

It executes queries faster using all the available cores in the cluster as well as provides Pandas’s rich set of features.

Buy Now

Questions 8

Which configuration can be enabled to optimize the conversion between Pandas and PySpark DataFrames using Apache Arrow?

Options:

spark.conf.set("spark.pandas.arrow.enabled", "true")

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

spark.conf.set("spark.sql.execution.arrow.enabled", "true")

spark.conf.set("spark.sql.arrow.pandas.enabled", "true")

Buy Now

Questions 9

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

df.write.mode("overwrite").json("path/to/file")

df.write.overwrite.json("path/to/file")

df.write.json("path/to/file", overwrite=True)

df.write.format("json").save("path/to/file", mode="overwrite")

Buy Now

Questions 10

The following code fragment results in an error:

@F.udf(T.IntegerType())

def simple_udf(t: str) -> str:

return answer * 3.14159

Which code fragment should be used instead?

Options:

@F.udf(T.IntegerType())

def simple_udf(t: int) -> int:

return t * 3.14159

@F.udf(T.DoubleType())

def simple_udf(t: float) -> float:

return t * 3.14159

@F.udf(T.DoubleType())

def simple_udf(t: int) -> int:

return t * 3.14159

@F.udf(T.IntegerType())

def simple_udf(t: float) -> float:

return t * 3.14159

Buy Now

Questions 11

What is the benefit of Adaptive Query Execution (AQE)?

Options:

It allows Spark to optimize the query plan before execution but does not adapt during runtime.

It enables the adjustment of the query plan during runtime, handling skewed data, optimizing join strategies, and improving overall query performance.

It optimizes query execution by parallelizing tasks and does not adjust strategies based on runtime metrics like data skew.

It automatically distributes tasks across nodes in the clusters and does not perform runtime adjustments to the query plan.

Buy Now

Questions 12

An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.

The initial code is:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 12

def in_spanish_inner(df: pd.Series) -> pd.Series:

model = get_translation_model(target_lang='es')

return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner, StringType())

How can the MLOps engineer change this code to reduce how many times the language model is loaded?

Options:

Convert the Pandas UDF to a PySpark UDF

Convert the Pandas UDF from a Series → Series UDF to a Series → Scalar UDF

Run thein_spanish_inner()function in amapInPandas()function call

Convert the Pandas UDF from a Series → Series UDF to an Iterator[Series] → Iterator[Series] UDF

Buy Now

Questions 13

Given a CSV file with the content:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 13

And the following code:

from pyspark.sql.types import *

schema = StructType([

StructField("name", StringType()),

StructField("age", IntegerType())

])

spark.read.schema(schema).csv(path).collect()

What is the resulting output?

Options:

[Row(name='bambi'), Row(name='alladin', age=20)]

[Row(name='alladin', age=20)]

[Row(name='bambi', age=None), Row(name='alladin', age=20)]

The code throws an error due to a schema mismatch.

Buy Now

Questions 14

The following code fragment results in an error:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

Which code fragment should be used instead?

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 14

Options:

Buy Now

Questions 15

What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?

Options:

Bothcache()andpersist()can be used to set the default storage level (MEMORY_AND_DISK_SER)

Both functions perform the same operation. Thepersist()function provides improved performance asits default storage level isDISK_ONLY.

persist()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK_SER) andcache()- Can be used to set different storage levels to persist the contents of the DataFrame.

cache()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK) andpersist()- Can be used to set different storage levels to persist the contents of the DataFrame

Buy Now

Questions 16

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

Add more executor instances to the cluster

Increase the driver memory on the client machine

Switch the deployment mode to cluster mode

Switch the deployment mode to local mode

Buy Now

Questions 17

A data engineer is streaming data from Kafka and requires:

Minimal latency

Exactly-once processing guarantees

Which trigger mode should be used?

Options:

.trigger(processingTime='1 second')

.trigger(continuous=True)

.trigger(continuous='1 second')

.trigger(availableNow=True)

Buy Now

Questions 18

A developer initializes a SparkSession:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 18

spark = SparkSession.builder \

.appName("Analytics Application") \

.getOrCreate()

Which statement describes thesparkSparkSession?

Options:

ThegetOrCreate()method explicitly destroys any existing SparkSession and creates a new one.

A SparkSession is unique for eachappName, and callinggetOrCreate()with the same name will return an existing SparkSession once it has been created.

If a SparkSession already exists, this code will return the existing session instead of creating a new one.

A new SparkSession is created every time thegetOrCreate()method is invoked.

Buy Now

Questions 19

A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set forspark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold.

Which type of join will Adaptive Query Execution (AQE) choose in this case?

Options:

A Cartesian join

A shuffled hash join

A broadcast nested loop join

A sort-merge join

Buy Now

Questions 20

Given this view definition:

df.createOrReplaceTempView("users_vw")

Which approach can be used to query the users_vw view after the session is terminated?

Options:

Query the users_vw using Spark

Persist the users_vw data as a table

Recreate the users_vw and query the data using Spark

Save the users_vw definition and query using Spark

Buy Now

Questions 21

A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:

Low number of Active Tasks

Many tasks complete in milliseconds

Fewer tasks than available CPUs

Which approach should be used to adjust the partitioning for optimal resource allocation?

Options:

Set the number of partitions equal to the total number of CPUs in the cluster

Set the number of partitions to a fixed value, such as 200

Set the number of partitions equal to the number of nodes in the cluster

Set the number of partitions by dividing the dataset size (1 TB) by a reasonable partition size, such as 128 MB

Buy Now

Questions 22

What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?

Options:

The conversion will automatically distribute the data across worker nodes

The operation will fail if the Pandas DataFrame exceeds 1000 rows

Data will be lost during conversion

The operation will load all data into the driver's memory, potentially causing memory overflow

Buy Now

Questions 23

An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.

How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?

Options:

Locate the executor logs on the Spark master node, typically under the/tmpdirectory.

Use the commandspark-submitwith the—verboseflag to print the logs to the console.

Use the Spark UI to select the stage and view the executor logs directly from the stages tab.

Fetch the logs by running a Spark job with thespark-sqlCLI tool.

Buy Now

Questions 24

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.The DataFrame has a column named email with customer email addresses. The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

customerDF.select(

col("email").substr(0, 5).alias("username"),

col("email").substr(-5).alias("domain")

)

customerDF.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

customerDF.withColumn("username", substring_index(col("email"), "@", 1)) \

.withColumn("domain", substring_index(col("email"), "@", -1))

customerDF.select(

regexp_replace(col("email"), "@", "").alias("username"),

regexp_replace(col("email"), "@", "").alias("domain")

)

Buy Now

Questions 25

An engineer wants to join two DataFramesdf1anddf2on the respectiveemployee_idandemp_idcolumns:

df1:employee_id INT,name STRING

df2:emp_id INT,department STRING

The engineer uses:

result = df1.join(df2, df1.employee_id == df2.emp_id, how='inner')

What is the behaviour of the code snippet?

Options:

The code fails to execute because the column names employee_id and emp_id do not match automatically

The code fails to execute because it must use on='employee_id' to specify the join column explicitly

The code fails to execute because PySpark does not support joining DataFrames with a different structure

The code works as expected because the join condition explicitly matches employee_id from df1 with emp_id from df2

Buy Now

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5-Python

Last Update: Jun 1, 2025

Questions: 85

PDF + Testing Engine

$57.75 ~~$164.99~~

Testing Engine

$43.75 ~~$124.99~~

PDF (Q&A)

$36.75 ~~$104.99~~

buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf

Summer Sale - Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dpm65

dumpsmate logo

Contact Email:

Hot Vendors

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Databricks Certified Associate Developer for Apache Spark 3.5-Python Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

PDF + Testing Engine

Testing Engine

PDF (Q&A)

24/7 Customer Support

Site Secure

Quick Links

Why Us

Unlimited Packages

SSL SECURE

DumpsMate Copyright