Principles of Database Management

The Practical Guide to Storing, Managing and Analyzing Big and Small Data

Cambridge University Press — Order on Amazon

Book cover This comprehensive textbook teaches the fundamentals of database design, modeling, systems, data storage, and the evolving world of data warehousing, governance and more. Written by experienced educators and experts in big data, analytics, data quality, and data integration, it provides an up-to-date approach to database management. This full-color, illustrated text has a balanced theory-practice focus, covering essential topics, from established database technologies to recent trends, like Big Data, NoSQL, and more. Fundamental concepts are supported by real-world examples, query and code walkthroughs, and figures, making it perfect for introductory courses for advanced undergraduates and graduate students in information systems or computer science.

Chinese book cover We are happy to announce that the Chinese translation of our book is now available.

→ Buy on DangDang or JD.com.
→ Download all course slides in PPT format in a ZIP archive (Mandarin version). 以ZIP存档下载PPT格式的所有课程幻灯片（普通话版）。

＊ Download a presentation covering the book's features ＊

Endorsements

Features

Principles of Database Management provides readers with the comprehensive database management information to understand and apply the fundamental concepts of database design and modeling, database systems, data storage, and the evolving world of data warehousing, governance and more. Designed for those studying database management for information management or computer science, this well-illustrated textbook has a well-balanced theory-practice focus and covers the essential topics, from established database technologies up to recent trends like Big Data, NoSQL, and analytics. On-going case studies, drill-down boxes that reveal deeper insights on key topics, retention questions at the end of every section of a chapter, and connections boxes that show the relationship between concepts throughout the text are included to provide the practical tools to get started in database administration.

Given the above considerations, the key distinctive features of our book are:

Full-color illustrations throughout the text
Extensive coverage of important trending topics, including data warehousing, business intelligence, data integration, data quality, data governance, Big Data, and analytics
An online playground with diverse environments, including MySQL for querying; MongoDB; Neo4j Cypher; and a tree structure visualization environment
Hundreds of examples to illustrate and clarify the concepts discussed that can be reproduced on the book’s companion online playground
Case studies, review questions, problems, and exercises in every chapter
Additional cases, problems, and exercises in the appendix
Extra lecture materials including video lectures and slides in English and Mandarin

Target audience

The target audience of our book consists of:

Under- or postgraduate students taking courses on database management in BSc and MSc programmes on Information Management and/or Computer Science
Business professionals who would like to refresh or update their knowledge on database management
Database administrators, database developers or database programmers interested in new developments in the area

The book can also be used by tutors in courses such as the following:

Principles of Database Management
Database Modelling
Database Systems
Data Management
Data Modelling

It can also be useful to universities working out degrees in e.g. Big Data & Analytics and Data Science.

Part 1: Databases and Database Design

Chapter 1: Fundamental Concepts of Database Management (Show/hide details)
- Applications of Database Technology
- Key Definitions
- File versus Database Approach to Data Management
  - The File-Based Approach
  - The Database Approach
- Elements of a Database System
  - Database Model Versus Instances
  - Data Model
  - The Three Layer Architecture
  - Catalog
  - Database Users
  - Database Languages
- Advantages of Database Systems and Database Management
  - Data Independence
  - Database Modelling
  - Managing Structured, Semi-Structured and Unstructured Data
  - Managing Data Redundancy
  - Specifying Integrity Rules
  - Concurrency Control
  - Backup and Recovery Facilities
  - Data Security
  - Performance Utilities
Chapter 2: Architecture and Categorization of DBMSs (Show/hide details)
- Architecture of a DBMS
  - Connection and Security Manager
  - DDL Compiler
  - Query Processor
  - Storage Manager
  - DBMS utilities
  - DBMS interfaces
- Categorization of DBMSs
  - Categorization Based on Data model
  - Categorization Based upon Degree of Simultaneous Access
  - Categorization Based on Architecture
  - Categorization Based on Usage
Chapter 3: Conceptual Data Modeling (Show/hide details)
- Phases of Database Design
- The Entity Relationship Model
  - Entity Types
  - Attribute Types
  - Relationship Types
  - Weak Entity Types
  - Ternary Relationship Types
  - Examples of the ER Model
  - Limitations of the ER Model
- The Enhanced Entity Relationship (EER) Model
  - Specialization/Generalization
  - Categorization
  - Aggregation
  - Examples of the EER Model
  - Designing an EER model
- The UML Class Diagram
  - Recap of Object Orientation
  - Classes
  - Variables
  - Access Modifiers
  - Associations
  - Specialization/Generalization
  - Aggregation
  - UML Example
  - Advanced UML Modeling Concepts
  - UML versus EER
Chapter 4: Organizational Aspects of Data Management (Show/hide details)
- Data Management
  - Catalogs and the Role of Metadata
  - Metadata Modelling
  - Data Quality
  - Data Governance
- Roles in Data Management
  - Information Architect
  - Database Designer
  - Data Owner
  - Data Steward
  - Database Administrator (DBA)
  - Data Scientist

Part 2: Types of Database Systems

Chapter 5: Legacy Databases (Show/hide details)
- The Hierarchical Model
- The Codasyl Model
Chapter 6: Relational Databases: The Relational Model (Show/hide details)
- The Relational Model
  - Basic Concepts
  - Formal Definitions
  - Types of Keys
  - Relational Constraints
  - Example Relational Data Model
- Normalization
  - Insertion, Deletion and Update Anomalies in an Unnormalized Relational Model
  - Informal Normalization Guidelines
  - Functional Dependencies and Prime Attribute Type
  - Normalization Forms
- Mapping a Conceptual ER Model to a Relational Model
  - Mapping Entity Types
  - Mapping Relationship Types
  - Mapping Multivalued Attribute Types
  - Mapping Weak Entity Types
  - Putting it All Together
- Mapping a Conceptual EER Model to a Relational Model
  - Mapping an EER Specialization
  - Mapping an EER Categorization
  - Mapping an EER Aggregation
Chapter 7: Relational Databases: Structured Query Language (SQL) (Show/hide details)
- Relational Database Management Systems and SQL
  - Key Characteristics of SQL
  - Three-Layer Database Architecture
- SQL Data Definition Language (SQL DDL)
  - Key DDL Concepts
  - DDL Example
  - Referential Integrity Constraints
  - Drop and ALTER Command
- SQL Data Manipulation Language (SQL DML)
  - SQL SELECT statement
  - SQL INSERT Statement
  - SQL DELETE Statement
  - SQL UPDATE Statement
- SQL Views
- SQL Indexes
- SQL Prvileges
- SQL for Metadata Management
Chapter 8: Object Oriented Databases and Object Persistence (Show/hide details)
- Recap: Basic Concepts of OO
- Advanced Concepts of OO
  - Method Overloading
  - Inheritance
  - Method Overriding
  - Polymorphism and Dynamic Binding
- Basic Principles of Object Perstistence
  - Serialization
- OODBMS
  - Object Identifiers
  - ODMG Standard
  - Object Model
  - Object Definition Language (ODL)
  - Object Query Language (OQL)
  - Language Bindings
- Evaluating OODBMSs
Chapter 9: Extended Relational Databases (Show/hide details)
- Limitations of the Relational Model
- Active RDBMS Extensions
  - Triggers
  - Stored Procedures
- Object-relational RDBMS Extensions
  - User-Defined Types (UDTs)
  - User-Defined Functions (UDFs)
  - Inheritance
  - Behavior
  - Polymorphism
  - Collection Types
  - Large Objects
- Recursive SQL Queries
Chapter 10: XML Databases (Show/hide details)
- Extensible Markup Language
  - Basic Concepts
  - Document Type Definition and XML Schema Definition
  - Extensible Stylesheet Language
  - Namespaces
  - XPath
- Processing XML Documents
- Storage of XML Documents
  - The Document-Oriented Approach for Storing XML Documents
  - The Data-Oriented Approach for Storing XML Documents
  - The Combined Approach for Storing XML Documents
- Differences between XML Data and Relational Data
- Mappings between XML Documents and (Object-)relational Data
  - Table-Based Mapping
  - Schema-Oblivious Mapping
- Schema-aware Mapping
  - SQL/XML
- Searching XML Data
  - Full-text search
  - Keyword-Based Search
  - Structured Search with XQuery
  - Semantic Search with RDF and SPARQL
- XML for Information Exchange
  - Message Oriented Middleware (MOM)
  - SOAP-Based Web Services
  - REST-Based Web Services
  - Web Services and Databases
- Other Data Representation Formats
Chapter 11: NoSQL Databases (Show/hide details)
- The NoSQL Movement
  - The End of the ‘One Size Fits All’ Era?
  - The Emergence of the NoSQL Movement
- Key-value Stores
  - From Keys to Hashes
  - Horizontal Scaling
  - An Example: Memcached
  - Request Coordination
  - Consistent Hashing
  - Replication and Redundancy
  - Eventual Consistency
  - Stabilization
  - Integrity Constraints and Querying
- Tuple and Document Stores
  - Items with Keys
  - Filters and Queries
  - Complex Queries and Aggregation with MapReduce
  - SQL After All…
- Column-oriented Databases
- Graph Based Databases
  - Cypher Overview
  - Exploring a Social Graph
- Other NoSQL Categories

Part 3: Physical Data Storage, Transaction Management and Database Access

Chapter 12: Physical File Organization and Indexing (Show/hide details)
- Storage Hardware and Physical Database Design
  - The Storage Hierarchy
  - Internals of Hard Disk Drives
  - From Logical Concepts to Physical Constructs
- Record Organization
- File Organization
  - Introductory Concepts: Search Keys, Primary and Secondary File Organization
  - Heap File Organization
  - Sequential File Organization
  - Random File Organization (Hashing)
  - Indexed Sequential File Organization
  - List Data Organization (Linear and Nonlinear Lists)
  - Secondary Indexes and Inverted Files
  - B-Trees and B+-Trees
Chapter 13: Physical Database Organization (Show/hide details)
- Physical Database Organization and Database Access Methods
  - From Database to Tablespace
  - Index Design
  - Database Access Methods
  - Join Implementations
- Enterprise Storage Subsystems and Business Continuity
  - Disk Arrays and RAID
  - Enterprise Storage Subsystems
  - Business Continuity
Chapter 14: Basics of Transaction Management (Show/hide details)
- Transactions, Recovery and Concurrency Control
- Transactions and Transaction Management
  - Delineating Transactions and the Transaction Lifecycle
  - DBMS Components Involved in Transaction Management
  - The Logfile
- Recovery
  - Types of Failures
  - System Recovery
  - Media Recovery
- Concurrency Control
  - Typical Concurrency Problems
  - Schedules and Serial Schedules
  - Serializable Schedules
  - Optimistic and Pessimistic Schedulers
  - Locking and Locking Protocols
- The ACID Properties of Transactions
Chapter 15: Accessing Databases and Database APIs (Show/hide details)
- Database System Architectures
  - Centralized System Architectures
  - Tiered System Architectures
- Classification of Database APIs
  - Proprietary versus Universal APIs
  - Embedded Versus Call-level APIs
  - Early Binding Versus Late Binding
- Universal Database APIs
  - ODBC
  - OLE DB and ADO
  - ADO.NET
  - Java DataBase Connectivity (JDBC)
  - Intermezzo: SQL Injection and Access Security
  - SQLJ
  - Intermezzo: Embedded APIs versus Embedded DBMSs
  - Language-Integrated Querying
- Object Persistence and Object Relational Mapping APIs
  - Object Persistence with Enterprise JavaBeans
  - Object Persistence with the Java Persistence API
  - Object Persistence with Java Database Objects
  - Object Persistence in Other Host Languages
- Database API Summary
- Database Access in the World Wide Web
  - Introduction: The Original Web Server
  - The Common Gateway Interface: Towards Dynamic Web Pages
  - Client-side Scripting: The Desire for a Richer Web
  - JavaScript as a Platform
  - DBMSs Adapt: REST, Other Web Services and a Look Ahead
Chapter 16: Data Distribution and Distributed Transaction Management (Show/hide details)
- Distributed Systems and Distributed Databases
- Architectural Implications of Distributed Databases
- Fragmentation, Allocation and Replication
  - Vertical Fragmentation
  - Horizontal Fragmentation (Sharding)
  - Mixed Fragmentation
  - Replication
  - Distribution and Replication of Metadata
- Transparency
- Distributed Query Processing
- Distributed Transaction Management and Concurrency Control
  - Primary Site and Primary Copy 2PL
  - Distributed 2PL
  - The Two-Phase Commit Protocol (2PC)
  - Optimistic Concurrency and Loosely Coupled Systems
  - Compensation-Based Transaction Models
- Eventual Consistency and Base Transactions
  - Horizontal Fragmentation and Consistent Hashing
  - The CAP Theorem
  - BASE Transactions
  - Multi-Version Concurrency Control and Vector Clocks
  - Quorum-Based Consistency

Part 4: Data Warehousing, Data Governance and (Big) Data Analytics

Chapter 17: Data Warehousing and Business Intelligence (Show/hide details)
- Operational versus Tactical/Strategic Decision Making
- Data Warehouse Definition
- Data Warehouse Schemas
  - Star Schema
  - Snowflake Schema
  - Fact Constellation
  - Specific Schema Issues
- The Extraction Transformation and Loading (ETL) Process
- Data Marts
- Virtual Data Warehouses and Virtual Data Marts
- Operation Data Store
- Data Warehouses versus Data Lakes
- Business Intelligence
  - Query and Reporting
  - Pivot Tables
  - On-Line Analytical Processing (OLAP)
Chapter 18: Data Integration, Data Quality and Data Governance (Show/hide details)
- Data and Process Integration
  - Convergence of Analytical and Operational Data Needs
  - Data Integration and Data Integration Patterns
  - Data Services and Data Flows in the Context of Data and Process Integration
- Searching Unstructured Data and Enterprise Search
  - Principles of Full Text Search
  - Indexing Full Text Documents
  - Web Search Engines
  - Enterprise Search
- Data Quality and Master Data Management
- Data Governance
  - Total Data Quality Management (TQDM)
  - Capability Maturity Model Integration (CMMI)
  - Data Management Body of Knowledge (DMBOK)
  - Control Objectives for Information and Related Technology (COBIT)
  - Information Technology Infrastructure Library (ITIL)
- Outlook
Chapter 19: Big Data (Show/hide details)
- The 5 V's of Big Data
- Hadoop
  - History of Hadoop
  - The Hadoop Stack
- SQL on Hadoop
  - HBase: The First Database on Hadoop
  - Pig
  - Hive
- Apache Spark
  - Spark Core
  - Spark SQL
  - MLlib, Spark Streaming and GraphX
Chapter 20: Analytics (Show/hide details)
- The Analytics Process Model
- Example Analytics Applications
- Data Scientist Job Profile
- Data Preprocessing
  - Denormalizing Data for Analysis
  - Sampling
  - Exploratory Analysis
  - Missing Values
  - Outlier Detection and Handling
- Types of Analytics
  - Predictive Analytics
  - Evaluating Predictive Models
  - Descriptive Analytics
  - Social Network Analytics
- Post Processing of Analytical Models
- Critical Success Factors for Analytical Models
- Economic Perspective on Analytics
  - Total Cost of Ownership (TCO)
  - Return on Investment (ROI)
  - In- versus Outsourcing
  - On-Premise versus Cloud Solutions
  - Open Source versus Commercial Software
- Improving the ROI of Analytics
  - New Sources of Data
  - Data Quality
  - Management Support
  - Organizational Aspects
  - Cross-Fertilization
- Privacy and Security
  - Overall Considerations Regarding Privacy and Security
  - The RACI Matrix
  - Accessing Internal Data
  - Privacy Regulation

About the authors

Wilfried Lemahieu is a professor at KU Leuven, Faculty of Economics and Business, where he also holds the position of Dean. His teaching, for which he was awarded a ‘best teacher recognition’ includes Database Management, Enterprise Information Management and Management Informatics. His research focuses on big data storage and integration, data quality, business process management and service-oriented architectures. In this context, he collaborates extensively with a variety of industry partners, both local and international. His research is published in renowned international journals and he is a frequent lecturer for both academic and industry audiences. See feb.kuleuven.be/wilfried.lemahieu for further details.

Bart Baesens is a professor of Big Data and Analytics at KU Leuven (Belgium) and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on Big Data & Analytics, Credit Risk Modeling, Fraud Detection and Marketing Analytics. He wrote more than 200 scientific papers some of which have been published in well-known international journals (e.g. MIS Quarterly, Machine Learning, Management Science, MIT Sloan Management Review and IEEE Transactions on Knowledge and Data Engineering) and presented at international top conferences (e.g. ICIS, KDD, CAISE). He received various best paper and best speaker awards. Bart is the author of 6 books: Credit Risk Management: Basic Concepts (Oxford University Press, 2009), Analytics in a Big Data World (Wiley, 2014), Beginning Java Programming (Wiley, 2015), Fraud Analytics using Descriptive, Predictive and Social Network Techniques (Wiley, 2015), Credit Risk Analytics (Wiley, 2016) and Profit-Driven Business Analytics (Wiley, 2017). He sold more than 15.000 copies of these books worldwide, some of which have been translated in Chinese, Russian and Korean. His research is summarized at www.dataminingapps.com. He also regularly tutors, advises and provides consulting support to international firms with respect to their big data, analytics and credit risk management strategy.

Seppe vanden Broucke works as an assistant professor at the Faculty of Economics and Business, KU Leuven, Belgium. His research interests include business data mining and analytics, machine learning, process management and process mining. His work has been published in well-known international journals and presented at top conferences. He is also author of the book Beginning Java Programming (Wiley, 2015) of which more than 4000 copies were sold and which was also translated in Russian. Seppe's teaching includes Advanced Analytics, Big Data and Information Management courses. He also frequently teaches for industry and business audiences. See seppe.net for further details.