Schlussprüfung — Fachinformatiker Daten- und Prozessanalyse (English)
Wissens-Check zum gesamten Kurs 'Fachinformatiker Daten- und Prozessanalyse (English)': 43 Multiple-Choice-Fragen aus 13 Modulen — Antworten und Erklärunge
FI-DPA 02 Prozessmodellierung mit BPMN (EN) 4 Fragen
1. What is the main purpose of Business Process Model and Notation (BPMN) in process modeling?
- A) The complete automation of business processes without human intervention
- B) The standardized graphical representation of business processes for all stakeholders
- C) The pure documentation of existing processes without optimization possibilities
- D) The replacement of all other process modeling methods such as UML or flowcharts
Correct Answer: B. BPMN serves the standardized graphical representation of business processes that both technical and non-technical people can understand. Option A is incorrect because BPMN does not cause complete automation. Option C is incorrect because BPMN is not only for documentation but also for identifying optimization potential. Option D is incorrect because BPMN is a specialized method that does not replace others.
2. What does a "Pool" primarily represent in BPMN notation?
- A) A single activity or task within a process
- B) A condition or rule that controls the process flow
- C) A point in the process where something happens
- D) A participant role or an organizational unit
Correct Answer: D. A pool represents a participant role or an organizational unit and separates the entire process from other processes. Option A describes an activity, not a pool. Option B describes a gateway, not a pool. Option C describes an event, not a pool.
3. Which element in BPMN determines the flow of the process based on conditions, rules, or parallelism?
- A) Event
- B) Lane
- C) Gateway
- D) Pool
Correct Answer: C. A Gateway determines the flow of the process based on conditions, rules, or parallelism. An Event (Option A) represents a point where something happens. A Lane (Option B) divides a pool into horizontal strips. A Pool (Option D) represents a participant role or organizational unit.
4. Which of the following steps is the first in the practical application of BPMN process modeling?
- A) Drawing the main process flow with Start and End Event
FI-DPA 03 Datenmodellierung und Schemata (EN) 4 Fragen
Which of the following data models is a variant of the Star Schema where dimension tables are further normalized and have hierarchical relationships?
- A) Cube Schema
- B) Snowflake Schema
- C) Galaxy Schema
- D) Flat File Schema
Correct Answer: B. The Snowflake Schema is an extension of the Star Schema with further normalized dimension tables, while the other options are not common data models for data warehouse systems.
Which type of table contains the numeric measurements (facts) of a data warehouse, such as sales figures or revenues?
- A) Dimension Table
- B) Attribute Table
- C) Fact Table
- D) Master Data Table
Correct Answer: C. The fact table contains the numeric measurements, while the other options contain descriptive data or other types of reference data.
Which SCD strategy (Slowly Changing Dimensions) manages changes in dimension tables by adding a historical value?
- A) SCD Type 1 (Overwrite)
- B) SCD Type 2 (Historization)
- C) SCD Type 3 (Addition with historical value)
- D) SCD Type 4 (New Table)
Correct Answer: C. SCD Type 3 adds new columns to store historical values, while Type 1 overwrites old values and Type 2 creates a complete historization with time periods.
Which of the following data models consists of a central fact table and multiple directly connected dimension tables without further normalization?
- A) Snowflake Schema
- B) Star Schema
- C) Normalized Schema
- D) Entity-Relationship Schema
Correct Answer: B. The Star Schema is characterized by a
FI-DPA 04 ETL/ELT-Strecken (EN) 1 Fragen
What is the main difference between ETL and ELT?
- A) ETL processes data in the cloud, ELT locally
- B) In ETL, transformation occurs before loading, in ELT after loading
- C) ETL always uses staging areas, ELT does not
- D) ELT is only suitable for Big Data environments
Correct Answer: B. The essential difference lies in the timing of the transformation: ETL transforms data before loading into the target system, while ELT loads data first and then transforms it in the target system.
FI-DPA 05 Datenqualität messen und sichern (EN) 4 Fragen
Which of the following data quality criteria ensures that data matches across different systems?
- A) Completeness
- B) Consistency
- C) Accuracy
- D) Validity
Correct Answer: B. Consistency ensures that data matches across different systems or datasets. Completeness refers to the presence of all expected data, accuracy to the correctness of values, and validity is a broader term for compliance with established rules.
Which tool is presented in the module as an open-source framework for creating, validating, and documenting data quality expectations?
- A) Pandas
- B) NumPy
- C) Great Expectations
- D) SQLAlchemy
Correct Answer: C. Great Expectations is the framework presented in the module for automated monitoring of data quality. Pandas and NumPy are libraries for data manipulation and numerical calculations, and SQLAlchemy is a toolkit for SQL databases.
Which method is described in the module as a systematic process for examining the characteristics of data holdings to understand structure, content, and quality?
- A) Data Cleansing
- B) Data Profiling
- C) Data Modeling
- D) Data Aggregation
Correct Answer: B. Data profiling is the systematic process for examining the characteristics of data holdings. Data cleansing refers to the removal of errors, data modeling to structure definition, and data aggregation to the summarization of data.
Which of the following Python libraries is recommended in the module for performing data profiling with statistical metrics and distributions?
- A) TensorFlow
- B) Matplotlib
FI-DPA 06 SQL für Analytik (EN) 4 Fragen
What is the main difference between Window Functions and regular aggregate functions in SQL?
- A) Window Functions can only be applied to numeric data
- B) Window Functions do not group rows but perform calculations on a window of rows
- C) Window Functions always require a GROUP BY clause
- D) Window Functions can only be used with the DISTINCT clause
Correct Answer: B. Window Functions operate on a window of rows without grouping them, while regular aggregate functions group rows and return one value per group. Option A is incorrect because Window Functions work on various data types. Option C is incorrect because Window Functions work without GROUP BY. Option D is incorrect because DISTINCT is not used with Window Functions.
What is the main advantage of using Common Table Expressions (CTEs) in complex SQL queries?
- A) CTEs always improve query performance
- B) CTEs enable recursive queries
- C) CTEs increase the readability and modularity of queries
- D) CTEs can only be used with SELECT statements
Correct Answer: C. CTEs improve readability and modularity by breaking down complex queries into logical, named parts. Option A is incorrect because CTEs do not always improve performance. Option B is partially correct but not the main advantage. Option D is incorrect because CTEs can be used with INSERT, UPDATE, DELETE, etc.
What is the main purpose of Pivot/Unpivot operations in SQL?
- A) Compress data to save storage space
- B) Move data between different tables
- C) Change data structure by transforming rows into columns and vice versa
- D) Encrypt data to increase security
Correct Answer: C. Pivot/Unpivot operations change the data structure by transforming rows into columns (Pivot) or columns into rows (Unpivot), often for reports or dashboards. Option A is incorrect because it's not primarily about compression. Option B is incorrect because it's not about moving data. Option D is incorrect because it's not about encryption.
What can you NOT directly deduce from an EXPLAIN plan of a SQL query?
- A) The estimated costs of the query
- B) The exact records that will be returned by the query
- C) The used indexes
- D) The join methods employed
Correct Answer: B. An EXPLAIN plan shows how the database will execute the query, including estimated costs, indexes, and join methods, but it does not show the actual data that will be returned. Option A is incorrect because estimated costs are shown. Option C is incorrect because used indexes are shown. Option D is incorrect because join methods are shown.
FI-DPA 07 BI-Tools — Power BI und Metabase (EN) 3 Fragen
What is the main difference between DAX and Excel formulas?
- A) DAX can only work with numbers, Excel formulas also with text
- B) DAX is optimized for complex data models, Excel formulas for simple spreadsheet calculations
- C) DAX does not support cell references, only whole tables
- D) DAX can only be used in Power BI, Excel formulas are universally applicable
Correct Answer: B. DAX is specifically developed for BI tools and complex data models, while Excel formulas are optimized for tabular calculations. DAX can work with text and is also used in other Microsoft products.
Which principle is particularly important when designing dashboards?
- A) Maximum information density to show all relevant data at a glance
- B) Use as many different chart types as possible for variety
- C) Consistent visualization and focusing on the most important KPIs
- D) Avoid interaction options to reduce confusion
Correct Answer: C. Consistent visualization and focusing on the most important KPIs are crucial for clear, understandable dashboards. Too much information or chart types can overwhelm, while interaction improves the user experience.
What is the main advantage of Metabase compared to Power BI?
- A) Metabase offers significantly more visualization options
- B) Metabase does not require SQL knowledge for basic operations
- C) Metabase is better suited for large enterprise environments
- D) Metabase provides more advanced DAX functions
Correct Answer: B. Metabase is designed to allow users without deep SQL knowledge to explore data and create dashboards, while Power BI requires more technical knowledge for similar tasks.
FI-DPA 08 KPI-Systeme und Reporting (EN) 2 Fragen
1. Which of the following statements describes Balanced Scorecards most accurately?
- A) A pure financial system for measuring profitability
- B) A strategic management system that divides corporate objectives into four perspectives
- C) A method for pure process optimization without strategic relevance
- D) A tool for pure data collection without analysis functions
Correct Answer: B. The Balanced Scorecard divides corporate objectives into four perspectives: Finance, Customers, Internal Processes, and Learning & Growth. Option A is incorrect because the Balanced Scorecard goes beyond finances. Option C is incorrect because it includes strategic goals. Option D is incorrect because it includes analysis functions.
2. What criteria must SMART-KPIs fulfill?
- A) Simple, Measurable, Achievable, Relevant, Time-bound
- B) Specific, Measurable, Achievable, Relevant, Time-bound
- C) Strategic, Measurable, Actionable, Realistic, Timely
- D) Significant, Measurable, Applicable, Reliable, Targeted
Correct Answer: B. SMART-KPIs must be Specific, Measurable, Achievable, Relevant, and Time-bound. Option A is incorrect because "Simple" is not part of the acronym. Option C is incorrect because "Strategic" and "Timely" are not correct abbreviations. Option D is incorrect because the terms do not fit the SMART concept.
FI-DPA 09 Process Mining — Celonis und Disco (EN) 4 Fragen
What is the primary function of Process Mining?
- A) The manual design of business processes
- B) The objective analysis of business processes based on event logs
- C) The development of new software for process management
- D) The creation of organizational charts
Correct Answer: B. Process Mining enables the objective analysis of business processes based on event logs to understand and optimize actual flows. Option A describes process design, Option C software development, and Option D organizational structure, but not the core function of Process Mining.
What is an Event-Log in the context of Process Mining?
- A) A log of system errors and exceptions
- B) A structured dataset that captures the sequence of events in a process with timestamps, case IDs, and activity information
- C) A collection of user comments on process steps
- D) A report on employee performance
Correct Answer: B. An Event-Log is a structured dataset that captures the sequence of events in a process with timestamps, case IDs, and activity information and forms the basis for every process mining analysis. Option A describes error logs, Option C user feedback, and Option D performance evaluations, but not the specific structure of an Event-Log.
What is the main goal of a Conformance Checking analysis?
- A) The prediction of future process changes
- B) The verification of the alignment between the actual process and the target model
- C) The automation of process steps
- D) The visualization of process flows without prior knowledge
Correct Answer: B. Conformance Checking verifies the alignment between the actual process (as depicted in the event log) and the target model to uncover deviations, inefficiencies, and violations. Option A describes trend analyses, Option C process automation, and Option D discovery analyses, but not the core of Conformance Checking.
Which of the following statements best describes the purpose of a Performance Analysis in Process Mining?
- A) The identification of the most common proc
FI-DPA 10 Statistik-Basics für Analysten (EN) 4 Fragen
Which of the following measures of dispersion is in the same unit as the original data?
- A) Variance
- B) Standard deviation
- C) Range
- D) Interquartile range
Correct Answer: B. The standard deviation is the square root of the variance and is in the same unit as the original data. The variance is in squared units, while range and interquartile range are in the same unit but measure different aspects of dispersion.
What is the main distinction between correlation and causality?
- A) Correlation is always linear, causality is nonlinear
- B) Correlation describes a relationship, causality describes a cause-effect relationship
- C) Correlation can only exist between numerical variables, causality also between categorical ones
- D) Correlation is always positive, causality can also be negative
Correct Answer: B. Correlation merely describes that two variables vary together, while causality means that a change in one variable directly leads to a change in the other. Correlation is a necessary but not sufficient condition for causality.
Which measure of central tendency is most robust to outliers?
- A) Mean
- B) Mode
- C) Median
- D) Arithmetic mean
Correct Answer: C. The median is the middle value in ordered data and is not influenced by extreme values (outliers). The mean and arithmetic mean are identical and are strongly influenced by outliers, while the mode is also robust but not always uniquely defined.
What is the first step in a systematic hypothesis test?
- A) Calculate the test statistic
- B) Formulate the null and alternative hypotheses
- C) Choose the significance level
- D) Collect the data
Correct Answer: B. The first step in hypothesis testing is to clearly formulate the null hypothesis (which assumes no effect) and the alternative hypothesis (which represents the research claim). This must be done before collecting data or calculating statistics.
FI-DPA 11 Datenschutz, DSGVO und Anonymisierung (EN) 3 Fragen
What is the main goal of k-anonymity in data anonymization?
- A) Maximizing data precision
- B) Preventing the identification of individuals
- C) Reducing data volume
- D) Accelerating data processing
Correct answer: B. k-anonymity aims to ensure that individuals in a dataset cannot be identified by ensuring each person shares identical attributes with at least k-1 others. The other options do not describe the main goal of k-anonymity.
What is a key characteristic of Differential Privacy?
- A) Complete removal of all personal data
- B) Ensuring that adding or removing a single person does not significantly change analysis results
- C) Reversibility of anonymization upon request
- D) Guaranteeing 100% accuracy of data analysis
Correct answer: B. Differential Privacy is based on the mathematical principle that individual data points do not have a significant impact on the result. The other options do not describe the core concept of Differential Privacy.
What is meant by processing of data by third parties in the context of GDPR?
- A) Processing of data by a controller without external help
- B) Processing of data by a third party on behalf of the controller with a contractual agreement
- C) Automated processing of data without human intervention
- D) Processing of data for advertising purposes without consent
Correct answer: B. Processing of data by third parties means that a data processor processes personal data on behalf of the controller, where a contractual agreement is made to ensure compliance with data protection regulations.
FI-DPA 12 Projekt — Process-Discovery-Fallstudie (EN) 3 Fragen
What is the primary goal of Process-Discovery analysis?
- A) Checking whether actual processes match target models
- B) Identification of unknown or implicit process models from event data
- C) Calculation of process key performance indicators for monitoring
- D) Automation of business processes
Correct Answer: B. Process-Discovery aims to identify unknown process models from event data without making predefined assumptions. Option A describes Conformance Checking, option C KPI dashboards, and option D is process automation.
Which format is typically used for Event Logs in Process-Mining analyses?
- A) XML
- B) CSV
- C) XES
- D) JSON
Correct Answer: C. XES (eXtensible Event Stream) is the standard format for Event Logs in Process-Mining. While CSV and JSON can also be used, XES was specifically developed for Process-Mining analyses. XML is related but not the primary format.
What is the main difference between Process Discovery and Conformance Checking?
- A) Process Discovery uses KPI dashboards, Conformance Checking does not
- B) Process Discovery identifies unknown processes, Conformance Checking checks compliance with target models
- C) Process Discovery requires Event Logs, Conformance Checking does not
- D) Process Discovery is for real data, Conformance Checking only for simulated data
Correct Answer: B. Process Discovery identifies unknown p
FI-DPA 13 Maschinelles Lernen — Grundlagen und Algorithmen (EN) 3 Fragen
What is the main difference between supervised and unsupervised learning?
- A) Supervised learning always uses neural networks, unsupervised learning does not
- B) Supervised learning requires labeled data, unsupervised learning works with unlabeled data
- C) Supervised learning is always more accurate than unsupervised learning
- D) Supervised learning can only work with numerical data, unsupervised learning can also work with categorical data
Correct Answer: B. The key difference lies in the use of labeled data in supervised learning, while unsupervised learning works without predefined labels. Option A is incorrect as both learning forms include various algorithms. Option C is not generally valid as accuracy depends on the problem statement. Option D is incorrect as both learning forms can work with different data types.
Which category of machine learning does predicting house prices based on features like size, location, and year of construction belong to?
- A) Classification
- B) Clustering
- C) Regression
- D) Principal Component Analysis
Correct Answer: C. Regression is the prediction of continuous values like prices. Classification would be incorrect as it categorizes data. Clustering is unsupervised learning and PCA is used for dimensionality reduction, not prediction.
What problem arises when a machine learning model is too closely fitted to the training data?
- A) Underfitting
- B) Overfitting
- C) The Bias-Variance Dilemma
- D) The Problem of High Dimensionality
Correct Answer: B.<st
FI-DPA 14 ML-Pipeline — Daten, Training, Evaluation (EN) 4 Fragen
What is the main purpose of Feature Engineering in an ML pipeline?
- A) Reducing data size for faster processing
- B) Transforming and selecting features to improve model performance
- C) Fully automating the data process
- D) Eliminating all categorical variables from the dataset
Correct Answer: B. Feature Engineering aims to improve ML model performance through targeted transformation and selection of features, while the other options only represent partial aspects or misinterpretations of this process.
Why is a dataset split into training, validation, and test sets?
- A) To increase the amount of training data available
- B) To ensure the model can be evaluated on unseen data
- C) To reduce computational requirements
- D) To allow for parallel processing of data
Correct Answer: B. The split ensures that the model's performance can be evaluated on data it has never seen during training, providing an unbiased assessment of how well it will perform in real-world scenarios.
What is the primary benefit of using Cross-Validation?
- A) It reduces the time needed for model training
- B) It provides a more robust estimate of model performance
- C) It eliminates the need for a test set
- D) It automatically selects the best model
Correct Answer: B. Cross-Validation provides a more robust estimate of model performance by averaging results across multiple different splits of the data, reducing the impact of how the data is divided.
Which metric is most appropriate for evaluating a model on imbalanced data?
- A) Accuracy
- B) F1-Score
- C) Number of features
- D) Training time
Correct Answer: B. For imbalanced data, the F1-Score is more appropriate than Accuracy because it considers both Precision and Recall, providing a better measure of the model's performance on the minority class.