MC Which statement is CORRECT? The geodesic represents the longest path between two nodes. incorrect The betweenness counts the number of the times that a node or edge occurs in the geodesics of the network. correct The graph theoretic center is the node with the highest minimum distance to all other nodes. incorrect The closeness is always higher than the betweenness. incorrect MC The aim of clustering is to come up with clusters such that the... homogeneity within a cluster is maximized and the heterogeneity between clusters is maximized. correct homogeneity within a cluster is maximized and the heterogeneity between clusters is minimized. incorrect homogeneity within a cluster is minimized and the heterogeneity between clusters is maximized. incorrect homogeneity within a cluster is minimized and the heterogeneity between clusters is minimized. incorrect MC Which statement is CORRECT? When using on premise solutions, maintenance or upgrade projects may even go by unnoticed. incorrect On premise solutions catalyze improved collaboration across business departments and geographical locations. incorrect The big footprint access to data management and analytics capabilities is a serious drawback of cloud based solutions. incorrect An important advantage of cloud based solutions concerns the scalability and economies of scale offered. More capacity (e.g. servers) can be added on the fly whenever needed. correct MC Consider a data set with a multiclass target variable as follows: 25% bad payers, 25% poor payers, 25% medium payers and 25% good payers. In this case, the entropy will be: Maximal correct Minimal incorrect MC Which of the following are interesting data sources to consider to boost the performance of analytical models? Network data incorrect External data incorrect Unstructured data such as text data and multimedia data incorrect All of the above correct MC Bootstrapping refers to: Drawing samples without replacement. incorrect Drawing samples with replacement. correct MC Given the following decision tree:



According to the decision tree, an applicant with Income > $50.000 and High Debt=Yes is classified as: Good Risk incorrect Bad Risk correct MC Which statement is CORRECT? Data owners are the data quality experts who are in charge of assessing data quality by performing extensive and regular data quality checks. incorrect The graph theoretic center is the node with the highest minimum distance to all other nodes. incorrect Outlying observations which represent erroneous data are treated using missing value procedures. correct Featurization in the context of neural networks refers to adding more nodes to the network. incorrect MC Which of the following measures cannot be used to make the splitting decision in a regression tree? Mean Squared Error (MSE) incorrect ANOVA/F-test incorrect Entropy correct MC Which of the following is not an advantage of open source software for analytics? A world-wide network of developers can work on it. incorrect It is available for free. incorrect It has been thoroughly engineered and extensively tested, validated and completely documented. correct It can be used in combination with commercial software. incorrect