MC	What do the 5 V's of Big Data stand for?	Volume, Variety, Velocity, Veracity, Value.	correct	Volume, Visualization, Velocity, Variety, Value.	incorrect	Volume, Variety, Velocity, Variability, Value.	incorrect	Volume, Versatile, Velocity, Visualization, Value.	incorrect	
MC	Which statement is NOT CORRECT?	Velocity in Big Data refers to data "in movement".	incorrect	Volume in Big Data refers to data "at rest".	incorrect	Veracity in Big Data refers to data "in change".	correct	Variety in Big Data refers to data "in many forms".	incorrect	
MC	Which components does the base Hadoop stack include?	NDFS, MapReduce, and YARN.	incorrect	HDFS, MapReduce and YARN.	correct	HDFS, Map and Reduce.	incorrect	HDFS, Spark and YARN.	incorrect	
MC	Which statement is CORRECT?	DataNodes in HDFS store a registry of metadata.	incorrect	The HDFS NameNode sends regular heartbeat messages to its DataNodes.	incorrect	HDFS is composed of a NameNode, DataNodes, and an optional SecondaryNameNode.	correct	Both the SecondaryNameNode and primary NameNode can simultaneously handle requests from clients.	incorrect	
MC	Which statement is NOT CORRECT?	A mapper in Hadoop maps each element in a collection to one or more output elements.	incorrect	A reducer in Hadoop reduces a collection of elements to one or more output elements.	incorrect	Reducer workers in Hadoop will start once all mapper workers have fished.	correct	A MapReduce pipeline in Hadoop can include an optional Sorter to sort the final output.	incorrect	
MC	Which statement is NOT CORRECT?	Apart from handling MapReduce programs, YARN can also be used to manage other types of applications.	incorrect	YARN's JobHistoryServer keeps a log of all finished jobs.	incorrect	NodeManagers in YARN are responsible for setting up containers on the node hosting a particular (sub)task.	incorrect	The YARN ApplicationMaster contains a scheduler which will hold submitted jobs in a queue until they are deemed ready to start.	correct	
MC	Which of the following commands are not a part of HBase?	Place	correct	Put	incorrect	Get	incorrect	Describe	incorrect	
MC	Which statement is CORRECT?	HBase can be considered as a NoSQL database.	correct	HBase offers a SQL engine to query its data.	incorrect	MapReduce programs cannot be used with HBase. Data is accessed using simple put and get commands instead.	incorrect	HBase works well on large clusters as well as small ones having a few nodes.	incorrect	
MC	Pig is...	A programming language that can be used to query HDFS data.	incorrect	A project offering a programming language to provide more user-friendliness compared to MapReduce programs.	correct	A database which runs on Hadoop.	incorrect	A SQL engine which runs on top of Hadoop.	incorrect	
MC	Which statement is NOT CORRECT?	Hive offers an SQL engine to query Hadoop data.	incorrect	Hive's query language is not as feature complete as the full SQL standard.	incorrect	Hive offers a JDBC interface.	incorrect	Hive queries run much faster than hand-written MapReduce programs.	correct	
MC	Which of the following schema handling methods does Hive apply?	Schema on write	incorrect	Schema on load	incorrect	Schema on read	correct	Schema on query	incorrect	
MC	Which statement is NOT CORRECT?	RDDs allow for two forms of operations: transformations and actions.	incorrect	RDDs represent an abstract, immutable data structure.	incorrect	RDDs are structured and represent a collection of columnar objects.	correct	RDDs offer failure protection by tracking the lineage of operations that are applied on them.	incorrect	
MC	Which of the following is not one of the reasons why Spark programs are generally faster than MapReduce operations?	Because Spark tries to keep its RDDs in memory as long as possible.	incorrect	Because Spark uses a directed acyclic graph instead of MapReduce.	incorrect	Because RDD transformations are "lazily" applied.	incorrect	Because Mesos can be used as a resource manager instead of YARN.	correct	
MC	Which statement is NOT CORRECT?	Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine.	incorrect	Spark SQL can be used from within Java, Python, Scala and R.	incorrect	Spark SQL can be used through ODBC and JDBC interfaces.	incorrect	Spark SQL DataFrames need to be created by loading a file.	correct	
MC	Which statement is CORRECT?	One of the disadvantages of Spark is that it does not support streaming data.	incorrect	One of the disadvantages of Spark is that its streaming and machine learning APIs are still mostly RDD based.	correct	One of the disadvantages of Spark is that it has no way to deal with graph based data.	incorrect	One of the disadvantages of Spark is that its streaming API does not allow to join multiple streams.	incorrect