Big Data Analytics with Hadoop and Spark
Last few years have seen significant advances in the technologies for data collection, data transmission and storage of data. For the first time in the history, thanks to these technologies, we are at a position where we can instrument a scientific phenomenon, or a business workflow, at a very fine granularity and the collect the required data for the relevant time period. This simple approach can be used for diverse phenomenon – evolving traffic patterns in a city, or how social media plays a part in enhancing/diminishing a brand, or how a protein will fold and impact their function. Think of it as the technology has given us a ringside seat to observe the phenomenon in an unprecedented detail.
Though we might have earned ringside seat, we still have to do some work to get a ringside view on the phenomenon. The data that is collected and stored is far from usable, and in its raw form will not give us the understanding we are looking for. There is a significant analysis that needs to be done on the data before we can learn from that data and benefit from that learning. This task of analyzing this large amount of data to extract insights is the basis the field of data science. A new set of technologies have mushroomed to support the use cases demanded by data science, these are the big data technologies.
Dates: Mon 25th May 2015 to Fri 29th May 2015
Location: Chalmers Technology University,
Timing: 10:00 am to 5:00 pm
- Data Science Methodology: You will learn an approach for conducting your data science experiment. This will include the high level steps that you need to follow for conducting and validating your experiment, you will also learn of the common pitfalls and gotchas.
- Tools & Technologies: You will get an exposure to the different tools & technologies that should be used for the different steps of the data science experiment. Starting from data collection all the way to data visualization. There will be assignments around Hadoop & Spark.
- Case Studies: You will get a peek inside some of the successful big data projects in scientific and business domains, where some of the methodology and techniques have been applied.
- Labs and Hands-on Experience on Big Data Technologies: You will get hands on experience on working on a subset of the big data technologies in a lab setting and you will leave the course with a working big data environment that you could use for your own data science experiments.
The course will progressively cover the steps involved in a typical data science experiment. Starting from data ingestion to data visualization. Each day of the course will be divided in 3 parts:
- An in-class lecture where a particular data science concept will be covered.
- A hands on lab session around the concept that was covered in the lecture. The lab session will be conducted under the guidance of the lab assistants. The labs sessions will vary in technical difficulty, for candidates who are comfortable with programming the lab sessions will focus on hand-on assignments, for others the labs will be around setting up a data science experiment and evaluating outcomes.
- A group discussion or a case study relevant to the concept that was covered in the class.
|Mon, 25th May||Tue, 26th May||Wed, 27th May||Thu, 28th May||Fri, 29th May|
| Introduction to Big Data & Analytics,
Problem Scope. End to end implementation (ShareInsights)
|Data Preparation, Data Quality and Data Integration||Data Analysis, Advanced Analytics.||Data Visualization||Wrap up and the applying Big Data Technologies|
|Lunch Break||Lunch Break||Lunch Break||Lunch Break||Lunch Break|
|Big Data Methodology.
Example from scientific domain
|Labs on relevant technologies||Labs on relevant technologies||Labs on relevant technologies||Labs on relevant technologies|
|Coffee Break||Coffee Break||Coffee Break||Coffee Break||Coffee Break|
|Group Discussion on Problem||Case Study from the Industry.||TBD||TBD||TBD|
- Data management & processing skills. Knowledge of one of the data process tools – Excel, R, SQL, database.
- Basic programming knowledge of scripting language.
- Expectation to bring a data science problem.
- Good to have: Statistics Knowledge
To apply for the course, please visit the Course page