According to the decision tree, an applicant with Income > $50.000 and High Debt=Yes is classified as: Good Risk incorrect Bad Risk correct MC Decision trees can be used in the following applications: Credit risk scoring incorrect Credit risk scoring and churn prediction correct Credit risk scoring, churn prediction and customer profile segmentation incorrect Credit risk scoring, churn prediction, customer profile segmentation and market basket analysis. incorrect MC Consider a data set with a multiclass target variable as follows: 25% bad payers, 25% poor payers, 25% medium payers and 25% good payers. In this case, the entropy will be: Minimal incorrect Maximal correct MC Which of the following measures cannot be used to make the splitting decision in a regression tree? Mean Squared Error (MSE) incorrect ANOVA/F-test incorrect Entropy correct MC Bootstrapping refers to: Drawing samples with replacement. correct Drawing samples without replacement. incorrect MC Clustering, association rules and sequence rules are examples of: Predictive analytics incorrect Descriptive analytics correct MC Given the following five transactions:

T1 {K, A, D, B}

T2 {D, A, C, E, B}

T3 {C, A, B, D}

T4 {B, A, E}

T5 {B, E, D}

Consider the association rule R: A -> BD.

Which statement is correct? The support of R is 100% and the confidence is 75%. incorrect The support of R is 60% and the confidence is 100%. incorrect The support of R is 75% and the confidence is 60%. incorrect The support of R is 60% and the confidence is 75%. correct MC The aim of clustering is to come up with clusters such that the... homogeneity within a cluster is minimized and the heterogeneity between clusters is maximized. incorrect homogeneity within a cluster is maximized and the heterogeneity between clusters is minimized. incorrect homogeneity within a cluster is minimized and the heterogeneity between clusters is minimized. incorrect homogeneity within a cluster is maximized and the heterogeneity between clusters is maximized. correct MC What statement about the adjacency matrix representing a social network is not true? It is a symmetric matrix. incorrect It is sparse since it contains a lot of non-zero elements. correct It can include weights. incorrect It has the same number of rows and columns. incorrect MC Which statement is CORRECT? The geodesic represents the longest path between two nodes. incorrect The betweenness counts the number of the times that a node or edge occurs in the geodesics of the network. correct The graph theoretic center is the node with the highest minimum distance to all other nodes. incorrect The closeness is always higher than the betweenness. incorrect MC Featurization in the context of neural networks refers to... selecting the most predictive features. incorrect adding more local features to the data set. incorrect making features (=inputs) out of the network characteristics. correct adding more nodes to the network. incorrect MC Which of the following activities are part of the post-processing step? Model interpretation and validation incorrect Sensitivity analysis incorrect Model representation incorrect All of these activities correct MC Is the following statement true or false? "All given success factors of an analytical model, i.e. relevance, performance, interpretability, efficiency, economical cost and regulatory compliance, are always equally important." True incorrect False correct MC Which of the following costs should be included in a Total Cost of Ownership (TCO) analysis? Acquisition costs incorrect Ownership and operation costs incorrect Post ownership costs incorrect All of these costs correct MC Which statement is NOT CORRECT? ROI analysis offers a common firm-wide language to compare multiple investment opportunities and decide which one(s) to go for. incorrect For companies like Facebook, Amazon, Netflix and Google a positive ROI is obvious since they essentially thrive on data and analytics. incorrect Although the benefit component is usually not that difficult to approximate, the costs are much harder to precisely quantify. correct Negative ROI of analytics often boils down to the lack of good quality data, management support and a company-wide data driven decision culture incorrect MC Which of the following is not a risk when outsourcing analytics? The fact that all analytical activities need to be outsourced correct The exchange of confidential information incorrect Continuity of the partnership incorrect Dilution of competitive advantage due to e.g. mergers and acquisitions. incorrect MC Which of the following is not an advantage of open source software for analytics? It is available for free. incorrect A world-wide network of developers can work on it. incorrect It has been thoroughly engineered and extensively tested, validated and completely documented. correct It can be used in combination with commercial software. incorrect MC Which statement is CORRECT? When using on premise solutions, maintenance or upgrade projects may even go by unnoticed. incorrect An important advantage of cloud based solutions concerns the scalability and economies of scale offered. More capacity (e.g. servers) can be added on the fly whenever needed. correct The big footprint access to data management and analytics capabilities is a serious drawback of cloud based solutions. incorrect On premise solutions catalyze improved collaboration across business departments and geographical locations. incorrect MC Which of the following are interesting data sources to consider to boost the performance of analytical models? Network data incorrect External data incorrect Unstructured data such as text data and multimedia data incorrect All of the above correct MC Which statement is CORRECT? Quality of data is key to the success of any analytical exercise since it has a direct and measurable impact on the quality of the analytical model and hence its economic value. correct Data preprocessing activities such as handling missing values, duplicate data or outliers are preventive measures for dealing with data quality issues. incorrect Data owners are the data quality experts who are in charge of assessing data quality by performing extensive and regular data quality checks. incorrect Data stewards can request data scientists to check or complete the value of a field. incorrect MC To guarantee maximum independence and organizational impact of analytics, it is important that... the Chief Data Officer (CDO) or Chief Analytics Officer (CAO) reports to the CIO or CFO. incorrect the CIO takes care of all analytical responsibilities. incorrect a Chief Data Officer or Chief Analytics officer is added to the executive committee who directly reports to the CEO. correct analytics is supervised only locally in the business units. incorrect MC What is the correct ranking of the following analytics applications in terms of maturity? Marketing Analytics (most mature), Risk Analytics (medium mature), HR Analytics (least mature). incorrect Risk Analytics (most mature), Marketing Analytics (medium mature), HR Analytics (least mature). correct Risk Analytics (most mature), HR Analytics (medium mature), Marketing Analytics (least mature). incorrect HR Analytics (most mature), Marketing Analytics (medium mature), Risk Analytics (least mature). incorrect MC Consider the following split:

What is the gain of this split (based on the entropy measure)? -0,18872 incorrect 0,18872 incorrect 0,31128 correct MC Given following transactions:

a. Harry Potter, Twilight, Game of Thrones

b. Harry Potter, Game of Thrones, Kite Runner

c. Twilight, Game of Thrones, Kite Runner

d. Harry Potter, Kite Runner, Pride and Prejudice

e. Twilight, Game of Thrones, Pride and Prejudice

Consider the following association rule: Twilight AND Game of Thrones => Kite Runner

Which statement is correct? The support = 1/5, confidence=1/3 and lift=1/9. correct The support=1/3, confidence=1/5 and lift=1/9. incorrect The support = 1/5, confidence=1/3 and lift=4/9. incorrect The support=1/2, confidence=1/5 and lift= 5/9. incorrect MC Given the following statements:

i. When using featurization, the network is summarized in a set of features, such as betweenness and closeness.

ii. The betweenness of a node is its average distance to all other nodes in the network.

Which of these statements are correct? Both statements are correct. incorrect Statement i is correct, but statement ii is not correct. correct Statement ii is correct, but statement i is not correct. incorrect Both statements are not correct. incorrect MC Given the following two statements:

i. Missing values are meaningless and should always be discarded.

ii. In outlier detection and handling, it is crucial to differentiate between valid and invalid values.

Which of these statements are correct? Both statements are correct. incorrect Statement i is correct but statement ii is not correct. incorrect Statement ii is correct but statement i is not correct. correct Both statements are not correct. incorrect MC In the context of churn prediction, what does the precision represent? The percentage of correctly classified observations (churners and non-churners). incorrect The percentage of churners correctly labeled by the model as churner. incorrect The percentage of non-churners labeled by the model as non-churner. incorrect The percentage of predicted churners who are actually churners. correct MC Which of the following data sources can be used to improve the ROI of analytics? Network data (explicit or implicit) incorrect Data pooling firms incorrect Macroeconomic data incorrect All of the above. correct MC Which of the following is the definition for the precision? (TP+TN)/(TP+TN+FP+FN) incorrect (FP+FN)/(TP+TN+FP+FN) incorrect TP/(TP+FN) incorrect TP/(TP+FP) correct MC Which of the following is the definition for the sensitivity? (TP+TN)/(TP+TN+FP+FN) incorrect (FP+FN)/(TP+TN+FP+FN) incorrect TP/(TP+FN) correct TP/(TP+FP) incorrect MC Which statement is CORRECT? In logistic regression, ordinary least squares (OLS) is used to determine the parameter values. incorrect When predicting a categorical value, logistic regression can be used. correct In logistic regression, if variable xi increases with one unit, the new odds become the old odds multiplied by ?i. incorrect In logistic regression, the doubling amount is equal to log (2) x ?i. incorrect MC Which statement is CORRECT? When building decision trees, we should chose the split with the lowest gain. incorrect When building decision trees, overfitting occurs when the error on the training set keep on increasing. incorrect When building decision trees, the impurity can be measured with entropy, gini, MSE or ANOVA with the F-test. correct Decision trees essentially model a linear decision boundary. incorrect MC Which statement is CORRECT? Divisive hierarchical clustering starts from all observations in individual clusters and merges the ones that are most similar until all observations make up a single cluster. incorrect A dendrogram can be used to decide upon the optimal number of clusters. It is a tree-like diagram that records the sequences or merges. correct A key advantage of k-means clustering is that the number of clusters need not be specified prior to the analysis. incorrect The single linkage method in hierarchical clustering defines the distance between two clusters as the biggest distance, or the distance between the two most dissimilar objects. incorrect MC Which statement is CORRECT? The data scientist is typically considered to be the owner of the data. incorrect Data security and data privacy are essentially the same thing. incorrect As the RACI matrix is dynamic, the different roles should be re-evaluated regularly. correct In terms of anonymization, a technical key is a conversion of a natural key, so that tables can no longer be joined and anonymity can be guaranteed. incorrect MC Which statement is NOT CORRECT? When evaluating a predictive model for a sufficiently large data set (> 1000 observations), we can split it up in a training and test set. incorrect Cross-validation can be used for small data sets (< 1000 observations). incorrect Leave-one-out cross-validation on a data set with n observations will result in n-1 analytical models. correct A side benefit of cross-validation is that you can calculate a standard deviation and confidence interval for the performance measure. incorrect MC Which statement is NOT CORRECT? The diagonal of a ROC curve represents a random scorecard, whereby sensitivity equals 1-specificity for all cutoff points. incorrect The lower the area under the ROC curve (AUC), the better the model performs. correct Classification measures like accuracy, specificity, sensitivity, recall and precision are dependent on the cutoff value. incorrect The main advantage of performance measures such as ROC and AUC is their independence of the cutoff value. incorrect MC Which statement is NOT CORRECT? The lift curve can be summarized by reporting top decile lift. incorrect There is a linear relation between AR and AUC: AR = 2 x AUC -1 incorrect The Pearson correlation coefficient always varies between -1 and +1. incorrect The coefficient of determination R2 is often used to measure the performance of classification models. correct MC Which statement is NOT CORRECT? Data pre-processing activities such as handling missing values, duplicate data, or outliers are preventive measures for dealing with data quality issues. correct Data stewards are the data quality experts who oversee assessing data quality by performing extensive and regular data quality checks. incorrect Data owners can correct the data in case of data quality issues. incorrect The causes of data quality issues are often deeply rooted within the core organizational processes and culture, and the IT infrastructure and architecture of an organization. incorrect MC Which statement is NOT CORRECT? SQL privileges and views can be used for access control. incorrect Label-Based Access Control (LBAC) is a control mechanism to protect your data against unauthorized access and can differentiate between the level of authorization that is granted to users. incorrect The EU approach to privacy protection relies on industry-specific legislation and self-regulation. correct The RACI matrix defines the following roles: Responsible, Accountable, Consulted and Informed. incorrect