Software Systems and Technologies for Big Data Applications.
- Basic principles of modern big data processing frameworks.
- Programming and use of such frameworks depending on the desired functionality: storage, querying, batch processing, graph processing, streaming, deep learning.
- Performance optimizations from the use of those frameworks.
ΗΥ360 και ΗΥ252, or instructor permission.
Konstantinos Solomos, Christoforos Leventis
Monday - Wednesday 18:00 - 20:00 A.113
- Instructor: Monday 17:00-18:00, location TBD
- TA: TBD
Assigned paper readings
Material from all over the web
- Class participation - reading discussion (30%)
- Programming assignments (30%)
- Project (40%)
The course has received an AWS Educate grant from the Amazon Web Services, which we highly appreciate. Registered students will receive credits to use the services of the Amazon Cloud.
- Big Data and Data Science
- A Guide to functional programming with Scala
Apache Spark Architecture and programming model
- Spark architecture
- Spark operators
- lazy evaluation
- Spark SQL
- Spark tutorial + debugging advice
Introduction to Machine Learning
- Brief introduction
- supervised vs unsupervised learning
- example pipelines
- linear algebra review
Distributed Machine Learning
- Scalability challenges for common problems: linear regression, logistic regression
- Spark MLlib
- Non numeric features: One Hot Encoding (OHE)
- OHE Sparsity
- Dimensionality reduction
- Multi dimensional data
- Challenges of graph processing
- Streaming use cases
- Spark Streaming - Structured Streaming
- Serialization, Deserialization
- Avro, Thrift, Protocol Buffers
- Column storage
- Scale up vs scale out
- MNIST image recognition
- Tensor Flow
- Διδάσκων: Kozanitis Christos