Principles of Database Management

The Practical Guide to Storing, Managing and Analyzing Big and Small Data

To be published by Cambridge University Press

Book cover This book covers the principles of database management. It starts by defining databases and the various steps of database design. A next part zooms into different types of database systems: pre-relational, relational, object oriented, XML and no-SQL databases. Subsequent chapters discuss transaction management and physical data storage aspects. Also data access and data integration in an x-tier environment are extensively covered. The book concludes by discussing data warehousing, big data and analytics. Throughout the book, we will include various examples and case studies to illustrate and clarify the concepts discussed. Every chapter will conclude with a set of self-study questions such that the book can be easily used as a textbook by colleague instructors. We will also extensively report on both our research and industry experience on the topic to further illustrate the practical impact of the concepts discussed.

We won't use your e-mail for any other messages.

Features

The book is the result of having taught an undergraduate database management class and a postgraduate advanced database management class for more than 10 years now. The undergraduate class was attended by students from a variety of different backgrounds (on average 120 students per year). Throughout these years, we have found no good textbook which covers the material in a comprehensive way without getting flooded by theoretical detail and losing focus. Hence, after having teamed up together, we decided to start writing a book ourselves. We believe the outstanding features of our book originate from the strength of our author team.

Given the above considerations, the key distinctive features of our book are:

  • The right balance between theory and practice
  • End-to-end coverage starting with legacy technologies to emerging trends such as Big Data, NoSQL databases, data quality, etc.
  • A unique perspective on how lessons learnt from past data management could be relevant in today’s technology setting (e.g. navigational access and its perils in Codasyl and XML/OO databases)
  • A critical reflection and accompanying risk management considerations when implementing the various technologies considered
  • The inclusion of exercises and case studies originating from a diversified and complimentary business experience
  • An encompassing cross chapter use case that is revisited and expanded in each chapter

Target audience

The target audience of our book consists of:

  • Under- or postgraduate students taking courses on database management in BSc and MSc programmes on Information Management and/or Computer Science
  • Business professionals who would like to refresh or update their knowledge on database management
  • Database administrators, database developers or database programmers interested in new developments in the area

The book can also be used by tutors in courses such as the following:

  • Principles of Database Management
  • Database Modelling
  • Database Systems
  • Data Management
  • Data Modelling
It can also be useful to universities working out degrees in e.g. Big Data & Analytics and Data Science.

Table of contents

Part 1: Databases and Database Design

  • Chapter 1: Fundamental Concepts of Database Management (Show/hide details)
    • Applications of Database Technology
    • Key Definitions
    • File versus Database Approach to Data Management
      • The File-Based Approach
      • The Database Approach
    • Elements of a Database System
      • Database Model Versus Instances
      • Data Model
      • The Three Layer Architecture
      • Catalog
      • Database Users
      • Database Languages
    • Advantages of Database Systems and Database Management
      • Data Independence
      • Database Modelling
      • Managing Structured, Semi-Structured and Unstructured Data
      • Managing Data Redundancy
      • Specifying Integrity Rules
      • Concurrency Control
      • Backup and Recovery Facilities
      • Data Security
      • Performance Utilities
  • Chapter 2: Architecture and Categorization of DBMSs (Show/hide details)
    • Architecture of a DBMS
      • Connection and Security Manager
      • DDL Compiler
      • Query Processor
      • Storage Manager
      • DBMS utilities
      • DBMS interfaces
    • Categorization of DBMSs
      • Categorization Based on Data model
      • Categorization Based upon Degree of Simultaneous Access
      • Categorization Based on Architecture
      • Categorization Based on Usage
  • Chapter 3: Conceptual Data Modeling (Show/hide details)
    • Phases of Database Design
    • The Entity Relationship Model
      • Entity Types
      • Attribute Types
      • Relationship Types
      • Weak Entity Types
      • Ternary Relationship Types
      • Examples of the ER Model
      • Limitations of the ER Model
    • The Enhanced Entity Relationship (EER) Model
      • Specialization/Generalization
      • Categorization
      • Aggregation
      • Examples of the EER Model
      • Designing an EER model
    • The UML Class Diagram
      • Recap of Object Orientation
      • Classes
      • Variables
      • Access Modifiers
      • Associations
      • Specialization/Generalization
      • Aggregation
      • UML Example
      • Advanced UML Modeling Concepts
      • UML versus EER
  • Chapter 4: Organizational Aspects of Data Management (Show/hide details)
    • Data Management
      • Catalogs and the Role of Metadata
      • Metadata Modelling
      • Data Quality
      • Data Governance
    • Roles in Data Management
      • Information Architect
      • Database Designer
      • Data Owner
      • Data Steward
      • Database Administrator (DBA)
      • Data Scientist

Part 2: Types of Database Systems

  • Chapter 5: Legacy Databases (Show/hide details)
    • The Hierarchical Model
    • The Codasyl Model
  • Chapter 6: Relational Databases: The Relational Model (Show/hide details)
    • The Relational Model
      • Basic Concepts
      • Formal Definitions
      • Types of Keys
      • Relational constraints
      • Example Relational Data Model
    • Normalization
      • Insertion, Deletion and Update Anomalies in an Unnormalized Relational Model
      • Informal Normalization Guidelines
      • Functional Dependencies and Prime Attribute Type
      • Normalization Forms
    • Mapping a Conceptual ER Model to a Relational Model
      • Mapping Entity Types
      • Mapping Relationship Types
      • Mapping Multivalued Attribute Types
      • Mapping Weak Entity Types
      • Putting it All Together
    • Mapping a Conceptual EER Model to a Relational Model
      • Mapping an EER Specialization
      • Mapping an EER Categorization
      • Mapping an EER Aggregation
  • Chapter 7: Relational Databases: Structured Query Language (SQL) (Show/hide details)
    • Relational Database Management Systems and SQL
      • Key Characteristics of SQL
      • Three-Layer Database Architecture
    • SQL Data Definition Language (SQL DDL)
      • Key DDL Concepts
      • DDL Example
      • Referential Integrity Constraints
      • Drop and ALTER Command
    • SQL Data Manipulation Language (SQL DML)
      • SQL SELECT statement
      • SQL INSERT Statement
      • SQL DELETE Statement
      • SQL UPDATE Statement
    • SQL Views
    • SQL Indexes
    • SQL Prvileges
    • SQL for Metadata Management
  • Chapter 8: Object Oriented Databases and Object Persistence (Show/hide details)
    • Recap: Basic Concepts of OO
    • Advanced Concepts of OO
      • Method Overloading
      • Inheritance
      • Method Overriding
      • Polymorphism and Dynamic Binding
    • Basic Principles of Object Perstistence
      • Serialization
    • OODBMS
      • Object Identifiers
      • ODMG Standard
      • Object Model
      • Object Definition Language (ODL)
      • Object Query Language (OQL)
      • Language Bindings
    • Evaluating OODBMSs
  • Chapter 9: Extended Relational Databases (Show/hide details)
    • Limitations of the Relational Model
    • Active RDBMS Extensions
      • Triggers
      • Stored Procedures
    • Object-relational RDBMS Extensions
      • User-Defined Types (UDTs)
      • User-Defined Functions (UDFs)
      • Inheritance
      • Behavior
      • Polymorphism
      • Collection Types
      • Large Objects
    • Recursive SQL Queries
  • Chapter 10: XML Databases (Show/hide details)
    • Extensible Markup Language
      • Basic Concepts
      • Document Type Definition and XML Schema Definition
      • Extensible Stylesheet Language
      • Namespaces
      • XPath
    • Processing XML Documents
    • Storage of XML Documents
      • The Document-Oriented Approach for Storing XML Documents
      • The Data-Oriented Approach for Storing XML Documents
      • The Combined Approach for Storing XML Documents
    • Differences between XML Data and Relational Data
    • Mappings between XML Documents and (Object-)relational Data
      • Table-Based Mapping
      • Schema-Oblivious Mapping
    • Schema-aware Mapping
      • SQL/XML
    • Searching XML Data
      • Full-text search
      • Keyword-Based Search
      • Structured Search with XQuery
      • Semantic Search with RDF and SPARQL
    • XML for Information Exchange
      • Message Oriented Middleware (MOM)
      • SOAP-Based Web Services
      • REST-Based Web Services
      • Web Services and Databases
    • Other Data Representation Formats
  • Chapter 11: NoSQL Databases (Show/hide details)
    • The NoSQL Movement
      • The End of the ‘One Size Fits All’ Era?
      • The Emergence of the NoSQL Movement
    • Key-value Stores
      • From Keys to Hashes
      • Horizontal Scaling
      • An Example: Memcached
      • Request Coordination
      • Consistent Hashing
      • Replication and Redundancy
      • Eventual Consistency
      • Stabilization
      • Integrity Constraints and Querying
    • Tuple and Document Stores
      • Items with Keys
      • Filters and Queries
      • Complex Queries and Aggregation with MapReduce
      • SQL After All…
    • Column-oriented Databases
    • Graph Based Databases
      • Cypher Overview
      • Exploring a Social Graph
    • Other NoSQL Categories

Part 3: Physical Data Storage, Transaction Management and Database Access

  • Chapter 12: Physical File Organization and Indexing (Show/hide details)
    • Storage Hardware and Physical Database Design
      • The Storage Hierarchy
      • Internals of Hard Disk Drives
      • From Logical Concepts to Physical Constructs
    • Record Organization
    • File Organization
      • Introductory Concepts: Search Keys, Primary and Secondary File Organization
      • Heap File Organization
      • Sequential File Organization
      • Random File Organization (Hashing)
      • Indexed Sequential File Organization
      • List Data Organization (Linear and Nonlinear Lists)
      • Secondary Indexes and Inverted Files
      • B-Trees and B+-Trees
  • Chapter 13: Physical Database Organization (Show/hide details)
    • Physical Database Organization and Database Access Methods
      • From Database to Tablespace
      • Index Design
      • Database Access Methods
      • Join Implementations
    • Enterprise Storage Subsystems and Business Continuity
      • Disk Arrays and RAID
      • Enterprise Storage Subsystems
      • Business Continuity
  • Chapter 14: Basics of Transaction Management (Show/hide details)
    • Transactions, Recovery and Concurrency Control
    • Transactions and Transaction Management
      • Delineating Transactions and the Transaction Lifecycle
      • DBMS Components Involved in Transaction Management
      • The Logfile
    • Recovery
      • Types of Failures
      • System Recovery
      • Media Recovery
    • Concurrency Control
      • Typical Concurrency Problems
      • Schedules and Serial Schedules
      • Serializable Schedules
      • Optimistic and Pessimistic Schedulers
      • Locking and Locking Protocols
    • The ACID Properties of Transactions
  • Chapter 15: Accessing Databases and Database APIs (Show/hide details)
    • Database System Architectures
      • Centralized System Architectures
      • Tiered System Architectures
    • Classification of Database APIs
      • Proprietary versus Universal APIs
      • Embedded Versus Call-level APIs
      • Early Binding Versus Late Binding
    • Universal Database APIs
      • ODBC
      • OLE DB and ADO
      • ADO.NET
      • Java DataBase Connectivity (JDBC)
      • Intermezzo: SQL Injection and Access Security
      • SQLJ
      • Intermezzo: Embedded APIs versus Embedded DBMSs
      • Language-Integrated Querying
    • Object Persistenct and Object Relational Mapping APIs
      • Object Persistence with Enterprise JavaBeans
      • Object Persistence with the Java Persistence API
      • Object Persistence with Java Database Objects
      • Object Persistence in Other Host Languages
    • Database API Summary
    • Database Access in the World Wide Web
      • Introduction: The Original Web Server
      • The Common Gateway Interface: Towards Dynamic Web Pages
      • Client-side Scripting: The Desire for a Richer Web
      • JavaScript as a Platform
      • DBMSs Adapt: REST, Other Web Services and a Look Ahead
  • Chapter 16: Data Distribution and Distributed Transaction Management (Show/hide details)
    • Distributed Systems and Distributed Databases
    • Architectural Implications of Distributed Databases
    • Fragmentation, Allocation and Replication
      • Vertical Fragmentation
      • Horizontal Fragmentation (Sharding)
      • Mixed Fragmentation
      • Replication
      • Distribution and Replication of Metadata
    • Transparency
    • Distributed Query Processing
    • Distributed Transaction Management and Concurrency Control
      • Primary Site and Primary Copy 2PL
      • Distributed 2PL
      • The Two-Phase Commit Protocol (2PC)
      • Optimistic Concurrency and Loosely Coupled Systems
      • Compensation-Based Transaction Models
    • Eventual Consistency and Base Transactions
      • Horizontal Fragmentation and Consistent Hashing
      • The CAP Theorem
      • BASE Transactions
      • Multi-Version Concurrency Control and Vector Clocks
      • Quorum-Based Consistency

Part 4: Data Warehousing, Data Governance and (Big) Data Analytics

  • Chapter 17: Data Warehousing and Business Intelligence (Show/hide details)
    • Operational versus Tactical/Strategic Decision Making
    • Data Warehouse Definition
    • Data Warehouse Schemas
      • Star Schema
      • Snowflake Schema
      • Fact Constellation
      • Specific Schema Issues
    • The Extraction Transformation and Loading (ETL) Process
    • Data Marts
    • Virtual Data Warehouses and Virtual Data Marts
    • Operation Data Store
    • Data Warehouses versus Data Lakes
    • Business Intelligence
      • Query and Reporting
      • Pivot Tables
      • On-Line Analytical Processing (OLAP)
  • Chapter 18: Data Integration, Data Quality and Data Governance (Show/hide details)
    • Data and Process Integration
      • Convergence of Analytical and Operational Data Needs
      • Data Integration and Data Integration Patterns
      • Data Services and Data Flows in the Context of Data and Process Integration
    • Searching Unstructured Data and Enterprise Search
      • Principles of Full Text Search
      • Indexing Full Text Documents
      • Web Search Engines
      • Enterprise Search
    • Data Quality and Master Data Management
    • Data Governance
      • Total Data Quality Management (TQDM)
      • Capability Maturity Model Integration (CMMI)
      • Data Management Body of Knowledge (DMBOK)
      • Control Objectives for Information and Related Technology (COBIT)
      • Information Technology Infrastructure Library (ITIL)
    • Outlook
  • Chapter 19: Big Data (Show/hide details)
    • The 5 V's of Big Data
    • Hadoop
      • History of Hadoop
      • The Hadoop Stack
    • SQL on Hadoop
      • HBase: The First Database on Hadoop
      • Pig
      • Hive
    • Apache Spark
      • Spark Core
      • Spark SQL
      • MLlib, Spark Streaming and GraphX
  • Chapter 20: Analytics (Show/hide details)
    • The Analytics Process Model
    • Example Analytics Applications
    • Data Scientist Job Profile
    • Data Preprocessing
      • Denormalizing Data for Analysis
      • Sampling
      • Exploratory Analysis
      • Missing Values
      • Outlier Detection and Handling
    • Types of Analytics
      • Predictive Analytics
      • Evaluating Predictive Models
      • Descriptive Analytics
      • Social Network Analytics
    • Post Processing of Analytical Models
    • Critical Success Factors for Analytical Models
    • Economic Perspective on Analytics
      • Total Cost of Ownership (TCO)
      • Return on Investment (ROI)
      • In- versus Outsourcing
      • On-Premise versus Cloud Solutions
      • Open Source versus Commercial Software
    • Improving the ROI of Analytics
      • New Sources of Data
      • Data Quality
      • Management Support
      • Organizational Aspects
      • Cross-Fertilization
    • Privacy and Security
      • Overall Considerations Regarding Privacy and Security
      • The RACI Matrix
      • Accessing Internal Data
      • Privacy Regulation

What you will find on this site

After the book is released, readers will be able to work with an interactive environment to:

  • Play around with SQL queries
  • Play around with a MongoDB NoSQL database
  • Play around with a Neo4j graph database
  • Play around with an HBase database

In addition, we will provide extra material (video lectures, slides, ...) and maintain an errata list here as well.

About the authors

Wilfried Lemahieu is full professor and Vice Dean for Education at the Faculty of Economics and Business, KU Leuven, Belgium. His teaching includes Database Management, Enterprise Information Management and Management Informatics. His current research focuses on big data storage & analytics, data quality, business process management and service oriented architectures. His findings have been published in international journals such as Decision Support Systems, Applied Soft Computing, International Journal of Information Management and Data & Knowledge Engineering. He is also a frequent lecturer for both academic and business audiences and has an extensive track record in research collaborations with industry. See feb.kuleuven.be/wilfried.lemahieu for further details.

Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on big data & analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Evolutionary Computation, Journal of Machine Learning Research, …) and presented at international top conferences. He is author of the books Credit Risk Management: Basic Concepts, Analytics in a Big Data World and Fraud Analytics using Descriptive, Predictive and Social Network Techniques and teaches e-learning courses on Advanced Analytics in a Big Data World and Credit Risk Modeling. His research is summarized at www.dataminingapps.com. He also regularly tutors, advises and provides consulting support to international firms with respect to their big data, analytics and credit risk management strategy.

Seppe vanden Broucke is an assistant professor at the Faculty of Economics and Business, KU Leuven, Belgium. His research interests include business data mining and analytics, machine learning, process management, and process mining. His work has been published in well-known international journals and presented at top conferences. Seppe's teaching includes Advanced Analytics, Big Data and Information Management courses. He also frequently teaches for industry and business audiences. See seppe.net for further details.


© Principles of Database Management 2017.
Send us your feedback.