Bộ câu hỏi thi chứng chỉ databrick certified data engineer associate version 2 (File 3 answer)

46 0 0
Bộ câu hỏi thi chứng chỉ databrick certified data engineer associate version 2 (File 3 answer)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Các câu hỏi trong bộ đề trích 100% từ bộ câu hỏi trong kì thi lấy chứng chỉ của databrick bộ đề gồm 6 file câu hỏi và câu trả lời có giải thích chi tiết để mọi người hiểu hơn về kiến trúc của lakehouse (File 3 answer.pdf)

1 QUESTION Which of the following is true, when building a Databricks SQL dashboard? A A dashboard can only use results from one query B Only one visualization can be developed with one query result C A dashboard can only connect to one schema/Database D More than one visualization can be developed using a single query result E A dashboard can only have one refresh schedule Unattempted the answer is, More than one visualization can be developed using a single query result In the query editor pane + Add visualization tab can be used for many visualizations for a single query result QUESTION A newly joined team member John Smith in the Marketing team currently has access read access to sales tables but does not have access to update the table, which of the following commands help you accomplish this? A GRANT UPDATE ON TABLE table_name TO john.smith@marketing.com B GRANT USAGE ON TABLE table_name TO john.smith@marketing.com C GRANT MODIFY ON TABLE table_name TO john.smith@marketing.com D GRANT UPDATE TO TABLE table_name ON john.smith@marketing.com E GRANT MODIFY TO TABLE table_name ON john.smith@marketing.com Unattempted The answer is GRANT MODIFY ON TABLE table_name TO john.smith@marketing.com https://docs.microsoft.com/en-us/azure/databricks/security/access-control/table-acls/object- privileges#privileges QUESTION A new user who currently does not have access to the catalog or schema is requesting access to the customer table in sales schema, but the customer table contains sensitive information, so you have decided to create view on the table excluding columns that are sensitive and granted access to the view GRANT SELECT ON view_name to user@company.com but when the user tries to query the view, gets the error view does not exist What is the issue preventing user to access the view and how to fix it? A User requires SELECT on the underlying table B User requires to be put in a special group that has access to PII data C User has to be the owner of the view D User requires USAGE privilege on Sales schema E User needs ADMIN privilege on the view Unattempted The answer is User requires USAGE privilege on Sales schema, Data object privileges – Azure Databricks | Microsoft Docs GRANT USAGE ON SCHEMA sales TO user@company.com; USAGE: does not give any abilities, but is an additional requirement to perform any action on a schema object QUESTION How you access or use tables in the unity catalog? A schema_name.table_name B schema_name.catalog_name.table_name C catalog_name.table_name D catalog_name.database_name.schema_name.table_name E catalog_name.schema_name.table_name Unattempted The answer is catalog_name.schema_name.table_name note: Database and Schema are analogous they are interchangeably used in the Unity catalog FYI, A catalog is registered under a metastore, by default every workspace has a default metastore called hive_metastore, with a unity catalog you have the ability to create meatstores and share that across multiple workspaces QUESTION How you upgrade an existing workspace managed table to a unity catalog table? ALTER TABLE table_name SET UNITY_CATALOG = TRUE A Create table catalog_name.schema_name.table_name B as select * from hive_metastore.old_schema.old_table C Create table table_name as select * from hive_metastore.old_schema.old_table D Create table table_name format = UNITY as select * from old_table_name E Create or replace table_name format = UNITY using deep clone old_table_name Unattempted The answer is Create table catalog_name.schema_name.table_name as select * from hive_metastore.old_schema.old_table Basically, we are moving the data from an internal hive metastore to a metastore and catalog that is registered in the Unity catalog note: if it is a managed table the data is copied to a different storage account, for a large tables this can take a lot of time For an external table the process is different Managed table: Upgrade a managed to Unity Catalog External table: Upgrade an external table to Unity Catalog QUESTION Which of the statements is correct when choosing between lakehouse and Datawarehouse? A Traditional Data warehouses have special indexes which are optimized for Machine learning B Traditional Data warehouses can serve low query latency with high reliability for BI workloads C SQL support is only available for Traditional Datawarehouse’s, Lakehouses support Python and Scala D Traditional Data warehouses are the preferred choice if we need to support ACID, Lakehouse does not support ACID E Lakehouse replaces the current dependency on data lakes and data warehouses uses an open standard storage format and supports low latency BI workloads Unattempted The lakehouse replaces the current dependency on data lakes and data warehouses for modern data companies that desire: · Open, direct access to data stored in standard data formats · Indexing protocols optimized for machine learning and data science · Low query latency and high reliability for BI and advanced analytics QUESTION Where are Interactive notebook results stored in Databricks product architecture? A Data plane B Control plane C Data and Control plane D JDBC data source E Databricks web application Unattempted The answer is Data and Control plane, Only Job results are stored in Data Plane(your storage), Interactive notebook results are stored in a combination of the control plane (partial results for presentation in the UI) and customer storage https://docs.microsoft.com/en-us/azure/databricks/getting-started/overview#–high-level-architecture Snippet from the above documentation, How to change this behavior? You can change this behavior using Workspace/Admin Console settings for that workspace, once enabled all of the interactive results are stored in the customer account(data plane) except the new notebook visualization feature Databricks has recently introduced, this still stores some metadata in the control pane irrespective of the below settings please refer to the documentation for more details Why is this important to know? I recently worked on a project where we had to deal with sensitive information of customers and we had a security requirement that all of the data need to be stored in the data plane including notebook results QUESTION Which of the following statements are true about a lakehouse? A Lakehouse only supports Machine learning workloads and Data warehouses support BI workloads B Lakehouse only supports end-to-end streaming workloads and Data warehouses support Batch workloads C Lakehouse does not support ACID D Lakehouse not support SQL E Lakehouse supports Transactions Unattempted What Is a Lakehouse? – The Databricks Blog QUESTION Which of the following SQL command can be used to insert or update or delete rows based on a condition to check if a row(s) exists? A MERGE INTO table_name B COPY INTO table_name C UPDATE table_name D INSERT INTO OVERWRITE table_name E INSERT IF EXISTS table_name Unattempted here is the additional documentation for your review https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html MERGE INTO target_table_name [target_alias] USING source_table_reference [source_alias] ON merge_condition [ WHEN MATCHED [ AND condition ] THEN matched_action ] […] [ WHEN NOT MATCHED [ AND condition ] THEN not_matched_action ] […] matched_action { DELETE | UPDATE SET * | UPDATE SET { column1 = value1 } [, …] } not_matched_action { INSERT * | INSERT (column1 [, …] ) VALUES (value1 [, …]) 10 QUESTION When investigating a data issue you realized that a process accidentally updated the table, you want to query the same table with yesterday‘s version of the data so you can review what the prior version looks like, what is the best way to query historical data so you can your analysis? A SELECT * FROM TIME_TRAVEL(table_name) WHERE time_stamp = ‘timestamp‘ B TIME_TRAVEL FROM table_name WHERE time_stamp = date_sub(current_date(), 1) C SELECT * FROM table_name TIMESTAMP AS OF date_sub(current_date(), 1) D DISCRIBE HISTORY table_name AS OF date_sub(current_date(), 1) E SHOW HISTORY table_name AS OF date_sub(current_date(), 1) Unattempted The answer is SELECT * FROM table_name TIMESTAMP as of date_sub(current_date(), 1) FYI, Time travel supports two ways one is using timestamp and the second way is using version number, Timestamp: SELECT count(*) FROM my_table TIMESTAMP AS OF “2019-01-01“ SELECT count(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1) SELECT count(*) FROM my_table TIMESTAMP AS OF “2019-01-01 01:30:00.000“ Version Number: SELECT count(*) FROM my_table VERSION AS OF 5238 SELECT count(*) FROM my_table@v5238 SELECT count(*) FROM delta.′/path/to/my/table@v5238′ https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html 11 QUESTION While investigating a data issue, you wanted to review yesterday‘s version of the table using below command, while querying the previous version of the table using time travel you realized that you are no longer able to view the historical data in the table and you could see it the table was updated yesterday based on the table history(DESCRIBE HISTORY table_name) command what could be the reason why you can not access this data? SELECT * FROM table_name TIMESTAMP AS OF date_sub(current_date(), 1) A You currently not have access to view historical data B By default, historical data is cleaned every 180 days in DELTA C A command VACUUM table_name RETAIN was ran on the table D Time travel is disabled E Time travel must be enabled before you query previous data Unattempted The answer is, VACUUM table_name RETAIN was ran The VACUUM command recursively vacuums directories associated with the Delta table and removes data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold The default is Days When VACUUM table_name RETAIN is ran all of the historical versions of data are lost time travel can only provide the current state 12 QUESTION You have accidentally deleted records from a table called transactions, what is the easiest way to restore the records deleted or the previous state of the table? Prior to deleting the version of the table is and after delete the version of the table is A RESTORE TABLE transactions FROM VERSION as of B RESTORE TABLE transactions TO VERSION as of C INSERT INTO OVERWRITE transactions SELECT * FROM transactions VERSION AS OF MINUS D SELECT * FROM transactions INSERT INTO OVERWRITE transactions SELECT * FROM transactions VERSION AS OF E INTERSECT Unattempted RESTORE (Databricks SQL) | Databricks on AWS RESTORE [TABLE] table_name [TO] time_travel_version Time travel supports using timestamp or version number time_travel_version { TIMESTAMP AS OF timestamp_expression | VERSION AS OF version } timestamp_expression can be any one of: ‘2018-10-18T22:15:12.013Z‘, that is, a string that can be cast to a timestamp cast(‘2018-10-18 13:36:32 CEST‘ as timestamp) ‘2018-10-18‘, that is, a date string current_timestamp() – interval 12 hours date_sub(current_date(), 1) Any other expression that is or can be cast to a timestamp 13 QUESTION Create a schema called bronze using location ‘/mnt/delta/bronze’, and check if the schema exists before creating A CREATE SCHEMA IF NOT EXISTS bronze LOCATION ‘/mnt/delta/bronze‘ B CREATE SCHEMA bronze IF NOT EXISTS LOCATION ‘/mnt/delta/bronze‘ C if IS_SCHEMA(‘bronze‘): CREATE SCHEMA bronze LOCATION ‘/mnt/delta/bronze‘ D Schema creation is not available in metastore, it can only be done in Unity catalog UI E Cannot create schema without a database Unattempted https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ LOCATION schema_directory ] 14 QUESTION How you check the location of an existing schema in Delta Lake? A Run SQL command SHOW LOCATION schema_name B Check unity catalog UI C Use Data explorer D Run SQL command DESCRIBE SCHEMA EXTENDED schema_name E Schemas are internally in-store external hive meta stores like MySQL or SQL Server

Ngày đăng: 29/02/2024, 15:36

Tài liệu cùng người dùng

Tài liệu liên quan