1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Bộ câu hỏi thi chứng chỉ databrick certified data engineer associate version 2 (File 3 question)

17 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Bộ Câu Hỏi Thi Chứng Chỉ Databrick Certified Data Engineer Associate Version 2 (File 3 Question)
Thể loại Question Set
Định dạng
Số trang 17
Dung lượng 221,75 KB

Nội dung

Các câu hỏi trong bộ đề trích 100% từ bộ câu hỏi trong kì thi lấy chứng chỉ của databrick bộ đề gồm 6 file câu hỏi và câu trả lời có giải thích chi tiết để mọi người hiểu hơn về kiến trúc của lakehouse (File 3 50 Question.pdf)

Trang 1

1 Question

Which of the following is true, when building a Databricks SQL dashboard?

A A dashboard can only use results from one query

B Only one visualization can be developed with one query result

C A dashboard can only connect to one schema/Database

D More than one visualization can be developed using a single query result

E A dashboard can only have one refresh schedule

2 Question

A newly joined team member John Smith in the Marketing team currently has access read access to sales tables but does not have access to update the table, which of the following commands help you accomplish this?

A GRANT UPDATE ON TABLE table_name TO john.smith@marketing.com

B GRANT USAGE ON TABLE table_name TO john.smith@marketing.com

C GRANT MODIFY ON TABLE table_name TO john.smith@marketing.com

D GRANT UPDATE TO TABLE table_name ON john.smith@marketing.com

E GRANT MODIFY TO TABLE table_name ON john.smith@marketing.com

3 Question

A new user who currently does not have access to the catalog or schema is requesting access to the customer table in sales schema, but the customer table contains sensitive information, so you have decided to create view on the table excluding columns that are sensitive and granted access to the view GRANT SELECT ON view_name to user@company.com but when the user tries to query the view, gets the error view does not exist What is the issue preventing user to access the view and how to fix it?

A User requires SELECT on the underlying table

B User requires to be put in a special group that has access to PII data

C User has to be the owner of the view

D User requires USAGE privilege on Sales schema

E User needs ADMIN privilege on the view

4 Question

How do you access or use tables in the unity catalog?

Trang 2

A schema_name.table_name

B schema_name.catalog_name.table_name

C catalog_name.table_name

D catalog_name.database_name.schema_name.table_name

E catalog_name.schema_name.table_name

5 Question

How do you upgrade an existing workspace managed table to a unity catalog table?

ALTER TABLE table_name SET UNITY_CATALOG = TRUE

A Create table catalog_name.schema_name.table_name

B as select * from hive_metastore.old_schema.old_table

C Create table table_name as select * from hive_metastore.old_schema.old_table

D Create table table_name format = UNITY as select * from old_table_name

E Create or replace table_name format = UNITY using deep clone old_table_name

6 Question

Which of the statements is correct when choosing between lakehouse and Datawarehouse?

A Traditional Data warehouses have special indexes which are optimized for Machine learning

B Traditional Data warehouses can serve low query latency with high reliability for BI

workloads

C SQL support is only available for Traditional Datawarehouse’s, Lakehouses support Python and Scala

D Traditional Data warehouses are the preferred choice if we need to support ACID, Lakehouse does not support ACID.

E Lakehouse replaces the current dependency on data lakes and data warehouses uses an open standard storage format and supports low latency BI workloads.

7 Question

Where are Interactive notebook results stored in Databricks product architecture?

A Data plane

B Control plane

C Data and Control plane

Trang 3

D JDBC data source

E Databricks web application

8 Question

Which of the following statements are true about a lakehouse?

A Lakehouse only supports Machine learning workloads and Data warehouses support BI workloads

B Lakehouse only supports end-to-end streaming workloads and Data warehouses support Batch workloads

C Lakehouse does not support ACID

D Lakehouse do not support SQL

E Lakehouse supports Transactions

9 Question

Which of the following SQL command can be used to insert or update or delete rows based on a condition to check if a row(s) exists?

A MERGE INTO table_name

B COPY INTO table_name

C UPDATE table_name

D INSERT INTO OVERWRITE table_name

E INSERT IF EXISTS table_name

10 Question

When investigating a data issue you realized that a process accidentally updated the table, you want

to query the same table with yesterday‘s version of the data so you can review what the prior version looks like, what is the best way to query historical data so you can do your analysis?

A SELECT * FROM TIME_TRAVEL(table_name) WHERE time_stamp = ‘timestamp‘

B TIME_TRAVEL FROM table_name WHERE time_stamp = date_sub(current_date(), 1)

C SELECT * FROM table_name TIMESTAMP AS OF date_sub(current_date(), 1)

D DISCRIBE HISTORY table_name AS OF date_sub(current_date(), 1)

E SHOW HISTORY table_name AS OF date_sub(current_date(), 1)

11 Question

Trang 4

While investigating a data issue, you wanted to review yesterday‘s version of the table using below command, while querying the previous version of the table using time travel you realized that you are

no longer able to view the historical data in the table and you could see it the table was updated yesterday based on the table history(DESCRIBE HISTORY table_name) command what could be the reason why you can not access this data?

SELECT * FROM table_name TIMESTAMP AS OF date_sub(current_date(), 1)

A You currently do not have access to view historical data

B By default, historical data is cleaned every 180 days in DELTA

C A command VACUUM table_name RETAIN 0 was ran on the table

D Time travel is disabled

E Time travel must be enabled before you query previous data

12 Question

You have accidentally deleted records from a table called transactions, what is the easiest way to restore the records deleted or the previous state of the table? Prior to deleting the version of the table

is 3 and after delete the version of the table is 4

A RESTORE TABLE transactions FROM VERSION as of 4

B RESTORE TABLE transactions TO VERSION as of 3

C INSERT INTO OVERWRITE transactions

SELECT * FROM transactions VERSION AS OF 3

D MINUS

E SELECT * FROM transactions

INSERT INTO OVERWRITE transactions

SELECT * FROM transactions VERSION AS OF 4

F INTERSECT

13 Question

Create a schema called bronze using location ‘/mnt/delta/bronze’, and check if the schema exists before creating

A CREATE SCHEMA IF NOT EXISTS bronze LOCATION ‘/mnt/delta/bronze‘

B CREATE SCHEMA bronze IF NOT EXISTS LOCATION ‘/mnt/delta/bronze‘

C if IS_SCHEMA(‘bronze‘): CREATE SCHEMA bronze LOCATION ‘/mnt/delta/bronze‘

D Schema creation is not available in metastore, it can only be done in Unity catalog UI

E Cannot create schema without a database

Trang 5

14 Question

How do you check the location of an existing schema in Delta Lake?

A Run SQL command SHOW LOCATION schema_name

B Check unity catalog UI

C Use Data explorer

D Run SQL command DESCRIBE SCHEMA EXTENDED schema_name

E Schemas are internally in-store external hive meta stores like MySQL or SQL Server

15 Question

Which of the below SQL commands create a Global temporary view?

A CREATE OR REPLACE TEMPORARY VIEW view_name

AS SELECT * FROM table_name

B CREATE OR REPLACE LOCAL TEMPORARY VIEW view_name

AS SELECT * FROM table_name

C CREATE OR REPLACE GLOBAL TEMPORARY VIEW view_name

AS SELECT * FROM table_name

D CREATE OR REPLACE VIEW view_name

AS SELECT * FROM table_name

E CREATE OR REPLACE LOCAL VIEW view_name

AS SELECT * FROM table_name

16 Question

When you drop a managed table using SQL syntax DROP TABLE table_name how does it impact metadata, history, and data stored in the table?

A Drops table from meta store, drops metadata, history, and data in storage.

B Drops table from meta store and data from storage but keeps metadata and history in storage

C Drops table from meta store, meta data and history but keeps the data in storage

D Drops table but keeps meta data, history and data in storage

E Drops table and history but keeps meta data and data in storage

17 Question

The team has decided to take advantage of table properties to identify a business owner for each table, which of the following table DDL syntax allows you to populate a table property identifying the business owner of a table

Trang 6

A CREATE TABLE inventory (id INT, units FLOAT)

SET TBLPROPERTIES business_owner = ‘supply chain‘

B CREATE TABLE inventory (id INT, units FLOAT)

TBLPROPERTIES (business_owner = ‘supply chain‘)

C CREATE TABLE inventory (id INT, units FLOAT)

SET (business_owner = ‘supply chain’)

D CREATE TABLE inventory (id INT, units FLOAT)

SET PROPERTY (business_owner = ‘supply chain’)

E CREATE TABLE inventory (id INT, units FLOAT)

SET TAG (business_owner = ‘supply chain’)

18 Question

Data science team has requested they are missing a column in the table called average price, this can

be calculated using units sold and sales amt, which of the following SQL statements allow you to reload the data with additional column

A INSERT OVERWRITE sales

SELECT *, salesAmt/unitsSold as avgPrice FROM sales

B CREATE OR REPLACE TABLE sales

AS SELECT *, salesAmt/unitsSold as avgPrice FROM sales

C MERGE INTO sales USING (SELECT *, salesAmt/unitsSold as avgPrice FROM sales)

D OVERWRITE sales AS SELECT *, salesAmt/unitsSold as avgPrice FROM sales

E COPY INTO SALES AS SELECT *, salesAmt/unitsSold as avgPrice FROM sales

19 Question

You are working on a process to load external CSV files into a delta table by leveraging the COPY INTO command, but after running the command for the second time no data was loaded into the table name, why is that?

COPY INTO table_name

FROM ‘dbfs:/mnt/raw/*.csv‘

FILEFORMAT = CSV

A COPY INTO only works one time data load

B Run REFRESH TABLE sales before running COPY INTO

C COPY INTO did not detect new files after the last load

D Use incremental = TRUE option to load new files

E COPY INTO does not support incremental load, use AUTO LOADER

20 Question

What is the main difference between the below two commands?

INSERT OVERWRITE table_name

Trang 7

SELECT * FROM table

CREATE OR REPLACE TABLE table_name

AS SELECT * FROM table

A INSERT OVERWRITE replaces data by default, CREATE OR REPLACE replaces data and Schema

by default

B INSERT OVERWRITE replaces data and schema by default, CREATE OR REPLACEreplaces data

by default

C INSERT OVERWRITE maintains historical data versions by default, CREATE OR REPLACEclears the historical data versions by default

D INSERT OVERWRITE clears historical data versions by default, CREATE OR REPLACE maintains the historical data versions by default

E Both are same and results in identical outcomes

21 Question

Which of the following functions can be used to convert JSON string to Struct data type?

A TO_STRUCT (json value)

B FROM_JSON (json value)

C FROM_JSON (json value, schema of json)

D CONVERT (json value, schema of json)

E CAST (json value as STRUCT)

22 Question

You are working on a marketing team request to identify customers with the same information

between two tables CUSTOMERS_2021 and CUSTOMERS_2020 each table contains 25 columns with the same schema, You are looking to identify rows that match between two tables across all columns, which of the following can be used to perform in SQL

A SELECT * FROM CUSTOMERS_2021

UNION

SELECT * FROM CUSTOMERS_2020

B SELECT * FROM CUSTOMERS_2021

UNION ALL

SELECT * FROM CUSTOMERS_2020

C SELECT * FROM CUSTOMERS_2021 C1

INNER JOIN CUSTOMERS_2020 C2

ON C1.CUSTOMER_ID = C2.CUSTOMER_ID

D SELECT * FROM CUSTOMERS_2021

INTERSECT

SELECT * FROM CUSTOMERS_2020

Trang 8

E SELECT * FROM CUSTOMERS_2021

EXCEPT

SELECT * FROM CUSTOMERS_2020

23 Question

You are looking to process the data based on two variables, one to check if the department is supply chain and second to check if process flag is set to True

A if department = “supply chain” & process:

B if department == “supply chain” && process:

C if department == “supply chain” & process == TRUE:

D if department == “supply chain” & if process == TRUE:

E if department == “supply chain“ and process:

24 Question

You were asked to create a notebook that can take department as a parameter and process the data accordingly, which is the following statements result in storing the notebook parameter into a python variable

A SET department = dbutils.widget.get(“department“)

B ASSIGN department == dbutils.widget.get(“department“)

C department = dbutils.widget.get(“department“)

D department = notebook.widget.get(“department“)

E department = notebook.param.get(“department“)

25 Question

Which of the following statements can successfully read the notebook widget and pass the python variable to a SQL statement in a Python notebook cell?

A order_date = dbutils.widgets.get(“widget_order_date“)

spark.sql(f“SELECT * FROM sales WHERE orderDate = ‘f{order_date }‘“)

B order_date = dbutils.widgets.get(“widget_order_date“)

spark.sql(f“SELECT * FROM sales WHERE orderDate = ‘order_date‘ “)

C order_date = dbutils.widgets.get(“widget_order_date“)

spark.sql(f”SELECT * FROM sales WHERE orderDate = ‘${order_date }‘ “)

D order_date = dbutils.widgets.get(“widget_order_date“)

spark.sql(f“SELECT * FROM sales WHERE orderDate = ‘{order_date}‘ “)

E order_date = dbutils.widgets.get(“widget_order_date“)

spark.sql(“SELECT * FROM sales WHERE orderDate = order_date“)

Trang 9

26 Question

The below spark command is looking to create a summary table based customerId and the number of times the customerId is present in the event_log delta table and write a one-time micro-batch to a summary table, fill in the blanks to complete the query

spark. _

.format(“delta“)

.table(“events_log“)

.groupBy(“customerId“)

.count()

. _

.format(“delta“)

.outputMode(“complete“)

.option(“checkpointLocation“, “/tmp/delta/eventsByCustomer/_checkpoints/“)

.trigger( )

.table(“target_table“)

A writeStream, readStream, once

B readStream, writeStream, once

C writeStream, processingTime = once

D writeStream, readStream, once = True

E readStream, writeStream, once = True

27 Question

You would like to build a spark streaming process to read from a Kafka queue and write to a Delta table every 15 minutes, what is the correct trigger option

A trigger(“15 minutes“)

B trigger(process “15 minutes“)

C trigger(processingTime = 15)

D trigger(processingTime = “15 Minutes“)

E trigger(15)

28 Question

Which of the following scenarios is the best fit for the AUTO LOADER solution?

A Efficiently process new data incrementally from cloud object storage

B Incrementally process new streaming data from Apache Kafa into delta lake

C Incrementally process new data from relational databases like MySQL

Trang 10

D Efficiently copy data from data lake location to another data lake location

E Efficiently move data incrementally from one delta table to another delta table

29 Question

You had AUTO LOADER to process millions of files a day and noticed slowness in load process, so you scaled up the Databricks cluster but realized the performance of the Auto loader is still not improving, what is the best way to resolve this

A AUTO LOADER is not suitable to process millions of files a day

B Setup a second AUTO LOADER process to process the data

C Increase the maxFilesPerTrigger option to a sufficiently high number

D Copy the data from cloud storage to local disk on the cluster for faster access

E Merge files to one large file

30 Question

The current ELT pipeline is receiving data from the operations team once a day so you had setup an AUTO LOADER process to run once a day using trigger (Once = True) and scheduled a job to run once

a day, operations team recently rolled out a new feature that allows them to send data every 1 min, what changes do you need to make to AUTO LOADER to process the data every 1 min

A Convert AUTO LOADER to structured streaming

B Change AUTO LOADER trigger to trigger(ProcessingTime = “1 minute“)

C Setup a job cluster run the notebook once a minute

D Enable stream processing

E Change AUTO LOADER trigger to (“1 minute“)

31 Question

What is the purpose of the bronze layer in a Multi-hop Medallion architecture?

A Copy of raw data, easy to query and ingest data for downstream processes.

B Powers ML applications

C Data quality checks, corrupt data quarantined

D Contain aggregated data that is to be consumed into Silver

E Reduces data storage by compressing the data

32 Question

Ngày đăng: 29/02/2024, 15:36

w