Home » Big Data Analytics with Hadoop and Spark

Big Data Analytics with Hadoop and Spark


Last few years have seen significant advances in the technologies for data collection, data transmission and storage of data. For the first time in the history, thanks to these technologies, we are at a position where we can instrument a scientific phenomenon, or a business workflow, at a very fine granularity and the collect the required data for the relevant time period. This simple approach can be used for diverse phenomenon – evolving traffic patterns in a city, or how social media plays a part in enhancing/diminishing a brand, or how a protein will fold and impact their function. Think of it as the technology has given us a ringside seat to observe the phenomenon in an unprecedented detail.
Though we might have earned ringside seat, we still have to do some work to get a ringside view on the phenomenon. The data that is collected and stored is far from usable, and in its raw form will not give us the understanding we are looking for. There is a significant analysis that needs to be done on the data before we can learn from that data and benefit from that learning. This task of analyzing this large amount of data to extract insights is the basis the field of data science. A new set of technologies have mushroomed to support the use cases demanded by data science, these are the big data technologies.

This course is offered jointly by the LAB group, Chalmers University and Persistent under the SeSE.

Important Details:

Dates: Mon 25th May 2015 to Fri 29th May 2015
Location: Chalmers Technology University,
Timing: 10:00 am to 5:00 pm


  • Data Science Methodology: You will learn an approach for conducting your data science experiment. This will include the high level steps that you need to follow for conducting and validating your experiment, you will also learn of the common pitfalls and gotchas.
  • Tools & Technologies: You will get an exposure to the different tools & technologies that should be used for the different steps of the data science experiment. Starting from data collection all the way to data visualization. There will be assignments around Hadoop & Spark.
  • Case Studies: You will get a peek inside some of the successful big data projects in scientific and business domains, where some of the methodology and techniques have been applied.
  • Labs and Hands-on Experience on Big Data Technologies: You will get hands on experience on working on a subset of the big data technologies in a lab setting and you will leave the course with a working big data environment that you could use for your own data science experiments.


The course will progressively cover the steps involved in a typical data science experiment. Starting from data ingestion to data visualization. Each day of the course will be divided in 3 parts:

  • An in-class lecture where a particular data science concept will be covered.
  • A hands on lab session around the concept that was covered in the lecture. The lab session will be conducted under the guidance of the lab assistants. The labs sessions will vary in technical difficulty, for candidates who are comfortable with programming the lab sessions will focus on hand-on assignments, for others the labs will be around setting up a data science experiment and evaluating outcomes.
  • A group discussion or a case study relevant to the concept that was covered in the class.

Tentative Agenda

 Mon, 25th May  Tue, 26th May Wed, 27th May Thu, 28th May Fri, 29th May
 Introduction to Big Data & Analytics,
Problem Scope. End to end implementation (ShareInsights)
 Data Preparation, Data Quality and Data Integration  Data Analysis, Advanced Analytics. Data Visualization  Wrap up and the applying Big Data Technologies
     Lunch Break Lunch Break Lunch Break Lunch Break Lunch Break
Big Data Methodology.
Example from scientific domain
Labs on relevant technologies Labs on relevant technologies  Labs on relevant technologies  Labs on relevant technologies
Coffee Break Coffee Break Coffee Break Coffee Break Coffee Break
Group Discussion on Problem  Case Study from the Industry. TBD TBD  TBD

Course Pre-requisites:

  • Data management & processing skills. Knowledge of one of the data process tools – Excel, R, SQL, database.
  • Basic programming knowledge of scripting language.
  • Expectation to bring a data science problem.
  • Good to have: Statistics Knowledge


To apply for the course, please visit the Course page