Software Systems and Technologies for Big Data Applications.
- Basic principles of modern big data processing frameworks.
- Programming and use of such frameworks depending on the desired functionality: storage, querying, batch processing, graph processing, streaming, deep learning.
- Performance optimizations from the use of those frameworks.
ΗΥ360 και ΗΥ252, or instructor permission.
Christos Kozanitis (kozanitis [papaki] ics.forth.gr)
Angelos Sinogeorgos (sinog [papaki] csd.uoc.gr)
Monday - Wednesday 16:00 - 18:00 at H.206
- Instructor: schedule via email. Please include text "543" in the subject of your email
- TA: schedule via email
Assigned paper readings
Online documentation of technologies that we study
- Class participation - reading discussion (30%)
- Programming assignments (30%)
- Project (40%)
Cloud credits by AWS. Students will have a credit to use compute and storage services of the Amazon cloud. Registered students should use the submission folder of the first week of the class to send their uoc email address to receive access to the platform.
- Big Data and Data Science
- A Guide to functional programming with Scala
Apache Spark Architecture and programming model
- Spark architecture
- Spark operators
- lazy evaluation
- Spark SQL
- Spark tutorial + debugging advice
Introduction to Machine Learning
- Brief introduction
- supervised vs unsupervised learning
- example pipelines
- linear algebra review
Distributed Machine Learning
- Scalability challenges for common problems: linear regression, logistic regression
- Spark MLlib
- Non numeric features: One Hot Encoding (OHE)
- OHE Sparsity
- Dimensionality reduction
- Multi dimensional data
- Challenges of graph processing
- Streaming use cases
- Spark Streaming - Structured Streaming
- Serialization, Deserialization
- Avro, Thrift, Protocol Buffers
- Column storage
- Scale up vs scale out
- MNIST image recognition
- Tensor Flow
- Διδάσκων: Kozanitis Christos