Description
MCS-226 Solved Assignment 2025 Available
Q1: Define the term data science. Describe its applications in two industries of your choice (e.g., healthcare, finance, e-commerce). What role does the data science lifecycle play in managing data projects?
Q2: Explain Exploratory Data Analysis (EDA) and its importance. What are the main steps in performing
EDA on a new dataset? Describe two methods for detecting outliers and how handling outliers impacts data analysis.
Q3: Describe the role of statistical hypothesis testing in data analysis. What are Type I and Type II errors, and how do they affect decision-making? Provide an example of hypothesis testing in a real-world scenario.
Q4: Discuss the 4 Vs of big data (Volume, Velocity, Variety, and Veracity). Provide a real-world example of each, explaining how these characteristics create challenges in big data management.
Q5: Explain the Hadoop architecture with a focus on HDFS and the master/slave architecture. How do NameNode and DataNodes work together to store and manage large datasets? Provide a basic example of this storage process.
Q6: Compare Apache Spark, Hive, and HBase in terms of functionality, data processing methods, and use cases. When would Spark be preferred over traditional MapReduce, and why?
Q7: Describe the purpose and functionality of a *Bloom filter* in data stream processing. How does the
Bloom filter efficiently check for element presence? Describe the Flajolet-Martin algorithm for cardinality estimation in data streams.
Q8: What is the PageRank algorithm, and how is it used in link analysis? Describe the concept of “flow of rank” in PageRank. Explain how the PageRank of a web page is calculated using the flow model.
Q9: Discuss the challenges of online advertising and recommendation systems. Explain the concept of collaborative filtering with an example, and discuss the role of clustering in social network analysis.
Q10: What is the Random Forest algorithm? Explain how it can be applied to classification problems. Write a program in R to implement a Random Forest classifier on a sample dataset and explain its output.
Reviews
There are no reviews yet.