Course Details
Big Data Programming and Hadoop Analysis
Course Synopsis
Big Data Programming and Hadoop Analytics introduces learners to the core concepts of Big Data, Hadoop ecosystem, and distributed data processing. The course focuses on hands-on experience with HDFS, YARN, MapReduce, Hive, Pig, and Apache Spark. Students will learn to build scalable data pipelines, manage Hadoop clusters, perform large-scale data analytics, and develop end-to-end Big Data solutions using real-world datasets in a hybrid learning environment.
Required Textbooks
Tom White, Hadoop: The Definitive Guide, O’Reilly Media.
Holden Karau et al., Learning Spark: Lightning-Fast Big Data Analytics, O’Reilly Media.
Recommended Resources:
Hadoop and Spark official documentation, GitHub lab repositories, cloud platform free-tier resources, and selected research papers.
Completion Criteria
After fulfilling all of the following criteria, the student will be deemed to have finished the Module:
Has attended 90% of all classes held
Has received an average grade of 80% on all assignments
Has received an average of 60% in assessments
The tutor believes the student has grasped all core concepts and is ready to proceed to the next module
Prerequisites
Basic understanding of the Linux command line
Fundamental programming knowledge (Python or Java preferred)
Familiarity with SQL concepts
Basic understanding of data structures