Home » Big data analytics – 2017

Big Data Analytics – systems and algorithms

The exponential increase in computational power and storage capacity over the last decades, combined with progress in data science, has facilitated a gigantic leap in the digital revolution. Today, data-driven and big-data methods have far reaching applications throughout society, including for identifying new important materials, for predicting and understanding environmental effects, and for solving crimes and keeping our society safe. In this course we introduce the notion of Big Data and study how we can store, manage, query and analyze this kind of data.

Goals

After the completion of the course you should be able to:

  • collect and store Big Data in a distributed computer environment
  • perform basic queries to a database operating on a distributed file system
  • account for basic principles of parallel computations
  • use MapReduce concept to parallelize common data processing algorithms
  • account for how standard machine learning models should be modified in order to process Big Data
  • use tools for machine learning for Big Data

Content

The course introduces main concepts and tools for storing, processing and analyzing Big Data which are necessary for professional work and research in data analytics.

  • Introduction to Big Data: concepts and tools
  • Basic principles of parallel computing
  • Introduction to databases
  • File systems and databases for Big Data
  • Querying Big Data
  • Resource management in a cluster environment
  • Parallelizing computations for Big Data
  • Basic Machine Learning algorithms
  • Machine Learning for Big Data

Schedule

Lecture week: March 20-24, 2017

Teachers

Patrick Lambrix, Christoph Kessler, Jose M Pena, Rickard Armiento, Valentina Ivanova, Zlatan Dragisic, Huanyu Li

Examination

Take home assignment.
Lab assignments.

(Preliminary) Schedule:

  • Monday and Tuesday morning: preliminaries (relational databases and Python for people who need refreshing these topics)
  • Tuesday afternoon: introduction to Big Data Analytics, databases for Big Data
  • Wednesday: parallel programming, databases for Big Data, lab session
  • Thursday: parallel programming, lab session
  • Friday: machine learning for Big Data, lab session

 

For literature and additional information we refer to:
http://www.ida.liu.se/~patla00/courses/BDA/

Contact

Patrick Lambrix, patrick.lambrix@liu.se

Registration

Given name(s) *                                    Family name *

University/Affiliation *

Email *

Supervisor *

Subject of the PhD project

Notes to the organizers