MC OLAP (On-Line Analytical Processing) can help in which of the following steps of the analytics process? Data collection incorrect Data visualization correct Data transformation incorrect Data denormalizatio incorrect MC The GIGO-principle mainly relates to which aspect of the analytics process? Data selection incorrect Data transformation incorrect Data cleaning incorrect The GIGO-principle applies to all of these listed aspects correct MC Which statement is CORRECT? Missing values should always be replaced or removed. incorrect Outliers should always be replaced or removed. incorrect Missing values and outliers can potentially provide useful information and should be analyzed before they are removed/replaced. correct Missing values and outliers should both always be replaced or removed. incorrect MC Which of the following strategies can be used to deal with missing values? Keep incorrect Delete incorrect Replace/impute incorrect All of these strategies can be applied correct MC Outlying observations which represent erroneous data are treated using... missing value procedures. correct truncation or capping. incorrect MC Given the following decision tree:



According to the decision tree, an applicant with Income > $50.000 and High Debt=Yes is classified as: Good Risk incorrect Bad Risk correct MC Decision trees can be used in the following applications: Credit risk scoring incorrect Credit risk scoring and churn prediction correct Credit risk scoring, churn prediction and customer profile segmentation incorrect Credit risk scoring, churn prediction, customer profile segmentation and market basket analysis. incorrect MC Consider a data set with a multiclass target variable as follows: 25% bad payers, 25% poor payers, 25% medium payers and 25% good payers. In this case, the entropy will be: Minimal incorrect Maximal correct MC Which of the following measures cannot be used to make the splitting decision in a regression tree? Mean Squared Error (MSE) incorrect ANOVA/F-test incorrect Entropy correct MC Bootstrapping refers to: Drawing samples with replacement. correct Drawing samples without replacement. incorrect MC Clustering, association rules and sequence rules are examples of: Predictive analytics incorrect Descriptive analytics correct MC Given the following five transactions:

T1 {K, A, D, B}
T2 {D, A, C, E, B}
T3 {C, A, B, D}
T4 {B, A, E}
T5 {B, E, D}

Consider the association rule R: A -> BD.

Which statement is correct? The support of R is 100% and the confidence is 75%. incorrect The support of R is 60% and the confidence is 100%. incorrect The support of R is 75% and the confidence is 60%. incorrect The support of R is 60% and the confidence is 75%. correct MC The aim of clustering is to come up with clusters such that the... homogeneity within a cluster is minimized and the heterogeneity between clusters is maximized. incorrect homogeneity within a cluster is maximized and the heterogeneity between clusters is minimized. incorrect homogeneity within a cluster is minimized and the heterogeneity between clusters is minimized. incorrect homogeneity within a cluster is maximized and the heterogeneity between clusters is maximized. correct MC What statement about the adjacency matrix representing a social network is not true? It is a symmetric matrix. incorrect It is sparse since it contains a lot of non-zero elements. correct It can include weights. incorrect It has the same number of rows and columns. incorrect MC Which statement is CORRECT? The geodesic represents the longest path between two nodes. incorrect The betweenness counts the number of the times that a node or edge occurs in the geodesics of the network. correct The graph theoretic center is the node with the highest minimum distance to all other nodes. incorrect The closeness is always higher than the betweenness. incorrect MC Featurization in the context of neural networks refers to... selecting the most predictive features. incorrect adding more local features to the data set. incorrect making features (=inputs) out of the network characteristics. correct adding more nodes to the network. incorrect MC Which of the following activities are part of the post-processing step? Model interpretation and validation incorrect Sensitivity analysis incorrect Model representation incorrect All of these activities correct MC Is the following statement true or false? "All given success factors of an analytical model, i.e. relevance, performance, interpretability, efficiency, economical cost and regulatory compliance, are always equally important." True incorrect False correct MC Which of the following costs should be included in a Total Cost of Ownership (TCO) analysis? Acquisition costs incorrect Ownership and operation costs incorrect Post ownership costs incorrect All of these costs correct MC Which statement is NOT CORRECT? ROI analysis offers a common firm-wide language to compare multiple investment opportunities and decide which one(s) to go for. incorrect For companies like Facebook, Amazon, Netflix and Google a positive ROI is obvious since they essentially thrive on data and analytics. incorrect Although the benefit component is usually not that difficult to approximate, the costs are much harder to precisely quantify. correct Negative ROI of analytics often boils down to the lack of good quality data, management support and a company-wide data driven decision culture incorrect MC Which of the following is not a risk when outsourcing analytics? The fact that all analytical activities need to be outsourced correct The exchange of confidential information incorrect Continuity of the partnership incorrect Dilution of competitive advantage due to e.g. mergers and acquisitions. incorrect MC Which of the following is not an advantage of open source software for analytics? It is available for free. incorrect A world-wide network of developers can work on it. incorrect It has been thoroughly engineered and extensively tested, validated and completely documented. correct It can be used in combination with commercial software. incorrect MC Which statement is CORRECT? When using on premise solutions, maintenance or upgrade projects may even go by unnoticed. incorrect An important advantage of cloud based solutions concerns the scalability and economies of scale offered. More capacity (e.g. servers) can be added on the fly whenever needed. correct The big footprint access to data management and analytics capabilities is a serious drawback of cloud based solutions. incorrect On premise solutions catalyze improved collaboration across business departments and geographical locations. incorrect MC Which of the following are interesting data sources to consider to boost the performance of analytical models? Network data incorrect External data incorrect Unstructured data such as text data and multimedia data incorrect All of the above correct MC Which statement is CORRECT? Quality of data is key to the success of any analytical exercise since it has a direct and measurable impact on the quality of the analytical model and hence its economic value. correct Data preprocessing activities such as handling missing values, duplicate data or outliers are preventive measures for dealing with data quality issues. incorrect Data owners are the data quality experts who are in charge of assessing data quality by performing extensive and regular data quality checks. incorrect Data stewards can request data scientists to check or complete the value of a field. incorrect MC To guarantee maximum independence and organizational impact of analytics, it is important that... the Chief Data Officer (CDO) or Chief Analytics Officer (CAO) reports to the CIO or CFO. incorrect the CIO takes care of all analytical responsibilities. incorrect a Chief Data Officer or Chief Analytics officer is added to the executive committee who directly reports to the CEO. correct analytics is supervised only locally in the business units. incorrect MC What is the correct ranking of the following analytics applications in terms of maturity? Marketing Analytics (most mature), Risk Analytics (medium mature), HR Analytics (least mature). incorrect Risk Analytics (most mature), Marketing Analytics (medium mature), HR Analytics (least mature). correct Risk Analytics (most mature), HR Analytics (medium mature), Marketing Analytics (least mature). incorrect HR Analytics (most mature), Marketing Analytics (medium mature), Risk Analytics (least mature). incorrect