Big data and data science - 2024-2025

Questions for the course "Big data and data science".


Click the button to start the quiz

Start Quiz

<- Leave quiz



Questions in the quiz (18)


What are the three V's in Big data?

  • Volume

  • Variety

  • Velocity

What dooes the term 'Volume' refer to in Big data?

  • The amount of data

  • The size of the data

What does the term 'Variety' refer to in Big data?

  • The different types of data

  • The different sources of data

What does the term 'Velocity' refer to in Big data?

  • The speed of data

  • The rate at which data is generated

What is Horizontal scaling?

  • Adding more machines to the system

  • Adding more nodes to the system

What is Vertical scaling?

  • Adding more power to the system

  • Adding more memory to the system

  • Adding more CPU to the system

  • Adding more storage to the system

What are some of the benefits with using vertical scaling?

  • Easier to manage

  • Less expensive, to start out with

What are some of the benefits with using horizontal scaling?

  • Easier to scale

  • Less expensive, in the long run

  • Commodity hardware

What are some caveats with using horizontal scaling?

  • Customized software

  • Load balancing

  • Network latency

  • Data consistency

  • Data partitioning

  • Data distribution

What does ETL mean?

  • Extract, Transform, Load

What does ELT mean?

  • Extract, Load, Transform

What does EtLT mean?

  • Extract, transform, Load, Transform

Why use Hadoop over GFS (Google File System)?

  • Open source

  • Available

What are some key characteristics of the Parquet file format?

  • Column-based

  • Column optimized search

  • Optimized for Map-Reduce processing

  • Schema stored at the end of the file

  • quick quiringof values in a column

  • Good at compuing aggregates or averages

  • Good for nested data structures

  • Supports schema evolution but not schema changes

What are some key characteristics of the Avro file format?

  • Row based

  • Supports schema changes and evolution

  • Optimized for record exchanges

  • Schema stored in human readable format at the beginning of the file

  • Good for sharing entire records between applications

  • Good for logging & auditing

What are some key characteristics of Sqoop?

  • Tool for importing structured (SQL, etc.) into hadoop

  • Not event driven

  • No longer maintained / Is archived

What are some key characteristics of Flume?

  • Designed for importing logs into Hadoop

  • Importing unstructured data

  • Work well with streamed datasources

  • Fault tolerant

  • linearly scalable

  • event driven

  • Can be used to buffer incoming data

What is the main characteristic of Kafka?

  • Distributed streaming platform

  • Publish-subscribe messaging system

  • Fault tolerant