Modul 3 von 15 · 📖 4 min Lesezeit · ⏱ 30 min gesamt
FI-DPA 03 Datenmodellierung und Schemata (EN)
Inhaltsverzeichnis (6 Abschnitte)
FI-DPA 03 Data Modeling and Schemas
In this module, you will learn the fundamentals of data modeling for data warehouse systems. You will understand the differences between Star Schema and Snowflake architecture, be able to identify fact and dimension tables, and know which SCD strategies (Slowly Changing Dimensions) are suitable for which use cases.
Concepts and Background
- Star Schema
- A data model consisting of a central fact table and multiple dimension tables that are directly connected to the fact table. The dimension tables are normalized to avoid redundancies.
- Snowflake Schema
- A variant of the Star Schema where dimension tables are further normalized and have hierarchical relationships. The structure resembles a snowflake pattern, hence the name.
- Fact Table
- Contains the numeric measurements (facts) of a data warehouse, such as sales figures or revenues. Each row represents an event point in time and references the associated dimensions via foreign keys.
- Dimension Table
- Contains descriptive attributes that can be used to analyze the facts, such as product information, customer data, or time information. Dimension tables are typically slowly changing.
- SCD Strategies (Slowly Changing Dimensions)
- Methods for managing changes in dimension tables. The most common strategies are SCD Type 1 (Overwrite), SCD Type 2 (Historization), and SCD Type 3 (Addition with historical value).
Architecture Diagram
flowchart TD
F[Faktentabelle
Verkäufe] --> D1[Produkt-Dimension]
F --> D2[Zeit-Dimension]
F --> D3[Kunden-Dimension]
F --> D4[Standort-Dimension]
D1 --> D1_1[Produktkategorie]
D1 --> D1_2[Produktgruppe]
D2 --> D2_1[Quartal]
D2 --> D2_2[Jahr]
D3 --> D3_1[Region]
D3 --> D3_2[Länderspezifika]
Practical Steps
- Define the business cases and the associated key metrics to be modeled. This forms the basis for the fact design.
- Identify all relevant dimensions needed for analyzing the key metrics, such as time, product, customer, and location.
- Design the fact table with the numeric measurements and foreign keys to the dimensions. Ensure the fact table is in third normal form.
- Create the dimension tables with the descriptive attributes. Normalize the dimension tables according to the chosen architecture (Star or Snowflake).
- Implement the SCD strategy for each dimension table. Type 2 is most often required for historical analyses.
- Create the database tables with the appropriate primary and foreign key relationships. Use constraints for data integrity.
- Implement ETL processes to populate the tables with data from source systems. This includes extraction, transformation, and loading.
- Test the data model with real data and validate the correctness of the aggregations and relationships.
Common Pitfalls
Further Resources
- Microsoft Learn: Data Modeling in Data Warehouses
- Ralph Kimball: The Data Warehouse Toolkit
- IBM Developer: Star Schema vs. Snowflake Schema
- Talend: Slowly Changing Dimensions Guide
- O'Reilly: Data Modeling for SQL Server
Knowledge Check
Four questions for self-assessment. Click on each question to see the correct answer and explanation.
Which of the following data models is a variant of the Star Schema where dimension tables are further normalized and have hierarchical relationships?
- A) Cube Schema
- B) Snowflake Schema
- C) Galaxy Schema
- D) Flat File Schema
Correct Answer: B. The Snowflake Schema is an extension of the Star Schema with further normalized dimension tables, while the other options are not common data models for data warehouse systems.
Which type of table contains the numeric measurements (facts) of a data warehouse, such as sales figures or revenues?
- A) Dimension Table
- B) Attribute Table
- C) Fact Table
- D) Master Data Table
Correct Answer: C. The fact table contains the numeric measurements, while the other options contain descriptive data or other types of reference data.
Which SCD strategy (Slowly Changing Dimensions) manages changes in dimension tables by adding a historical value?
- A) SCD Type 1 (Overwrite)
- B) SCD Type 2 (Historization)
- C) SCD Type 3 (Addition with historical value)
- D) SCD Type 4 (New Table)
Correct Answer: C. SCD Type 3 adds new columns to store historical values, while Type 1 overwrites old values and Type 2 creates a complete historization with time periods.
Which of the following data models consists of a central fact table and multiple directly connected dimension tables without further normalization?
- A) Snowflake Schema
- B) Star Schema
- C) Normalized Schema
- D) Entity-Relationship Schema
Correct Answer: B. The Star Schema is characterized by a