MC What do the 5 V's of Big Data stand for? Volume, Variety, Velocity, Veracity, Value. correct Volume, Visualization, Velocity, Variety, Value. incorrect Volume, Variety, Velocity, Variability, Value. incorrect Volume, Versatile, Velocity, Visualization, Value. incorrect MC Which statement is NOT CORRECT? Velocity in Big Data refers to data "in movement". incorrect Volume in Big Data refers to data "at rest". incorrect Veracity in Big Data refers to data "in change". correct Variety in Big Data refers to data "in many forms". incorrect MC Which components does the base Hadoop stack include? NDFS, MapReduce, and YARN. incorrect HDFS, MapReduce and YARN. correct HDFS, Map and Reduce. incorrect HDFS, Spark and YARN. incorrect MC Which statement is CORRECT? DataNodes in HDFS store a registry of metadata. incorrect The HDFS NameNode sends regular heartbeat messages to its DataNodes. incorrect HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode. correct Both the SecondaryNameNode and primary NameNode can simultaneously handle requests from clients. incorrect MC Which statement is NOT CORRECT? A mapper in Hadoop maps each element in a collection to one or more output elements. incorrect A reducer in Hadoop reduces a collection of elements to one or more output elements. incorrect Reducer workers in Hadoop will start once all mapper workers have fished. correct A MapReduce pipeline in Hadoop can include an optional Sorter to sort the final output. incorrect MC Which statement is NOT CORRECT? Apart from handling MapReduce programs, YARN can also be used to manage other types of applications. incorrect YARN's JobHistoryServer keeps a log of all finished jobs. incorrect NodeManagers in YARN are responsible for setting up containers on the node hosting a particular (sub)task. incorrect The YARN ApplicationMaster contains a scheduler which will hold submitted jobs in a queue until they are deemed ready to start. correct MC Which of the following commands are not a part of HBase? Place correct Put incorrect Get incorrect Describe incorrect MC Which statement is CORRECT? HBase can be considered as a NoSQL database. correct HBase offers a SQL engine to query its data. incorrect MapReduce programs cannot be used with HBase. Data is accessed using simple put and get commands instead. incorrect HBase works well on large clusters as well as small ones having a few nodes. incorrect MC Pig is... A programming language that can be used to query HDFS data. incorrect A project offering a programming language to provide more user-friendliness compared to MapReduce programs. correct A database which runs on Hadoop. incorrect A SQL engine which runs on top of Hadoop. incorrect MC Which statement is NOT CORRECT? Hive offers an SQL engine to query Hadoop data. incorrect Hive's query language is not as feature complete as the full SQL standard. incorrect Hive offers a JDBC interface. incorrect Hive queries run much faster than hand-written MapReduce programs. correct MC Which of the following schema handling methods does Hive apply? Schema on write incorrect Schema on load incorrect Schema on read correct Schema on query incorrect MC Which statement is NOT CORRECT? RDDs allow for two forms of operations: transformations and actions. incorrect RDDs represent an abstract, immutable data structure. incorrect RDDs are structured and represent a collection of columnar objects. correct RDDs offer failure protection by tracking the lineage of operations that are applied on them. incorrect MC Which of the following is not one of the reasons why Spark programs are generally faster than MapReduce operations? Because Spark tries to keep its RDDs in memory as long as possible. incorrect Because Spark uses a directed acyclic graph instead of MapReduce. incorrect Because RDD transformations are "lazily" applied. incorrect Because Mesos can be used as a resource manager instead of YARN. correct MC Which statement is NOT CORRECT? Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine. incorrect Spark SQL can be used from within Java, Python, Scala and R. incorrect Spark SQL can be used through ODBC and JDBC interfaces. incorrect Spark SQL DataFrames need to be created by loading a file. correct MC Which statement is CORRECT? One of the disadvantages of Spark is that it does not support streaming data. incorrect One of the disadvantages of Spark is that its streaming and machine learning APIs are still mostly RDD based. correct One of the disadvantages of Spark is that it has no way to deal with graph based data. incorrect One of the disadvantages of Spark is that its streaming API does not allow to join multiple streams. incorrect