DATA3404: Scalable Data Management

University home

Library

Current students

Staff intranet

Find an event

Give

Current students

Units DATA3404

Unit of study_

DATA3404: Scalable Data Management

2024 unit information

This unit of study provides a comprehensive overview of the internal mechanisms data science platforms and of the systems that manage large data collections. These skills are needed for successful performance tuning and to understand the scalability challenges faced by when processing Big Data. This unit builds upon the second' year DATA2001 - 'Data Science - Big Data and Data Diversity' and correspondingly assumes a sound understanding of SQL and data analysis tasks. The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning. The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing. The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.

Unit details and rules

Managing faculty or University school:

Computer Science

Details

Code	DATA3404
Academic unit	Computer Science
Credit points	6

Enrolment rules

Prerequisites: ?	DATA2001 OR DATA2901 OR ISYS2120 OR INFO2120 OR INFO2820
Corequisites: ?	None
Prohibitions: ?	INFO3504 OR INFO3404
Assumed knowledge: ?	This unit of study assumes that students have previous knowledge of database structures and of SQL. The prerequisite material is covered in DATA2001 or ISYS2120. Familiarity with a programming language (e.g. Java or C) is also expected

Learning outcomes

At the completion of this unit, you should be able to:

LO1. demonstrate experience with using/tuning data science platforms such as Apache Spark
LO2. understand different physical data organisations including data partitioning and data replication
LO3. understand disk-based indexing structures such as B-Trees, extensible hashing and bitmap indexes
LO4. understand the principles of query processing and query optimization
LO5. understand the principles of (distributed) data science platforms.
LO6. understand data sharding algorithms and data replication protocols
LO7. make effective physical data design decisions
LO8. identify a performance problem and be able to effectively tune the performance of a (distributed) data processing system

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Current year
Previous years

Session	MoA ?	Location	Outline ?
Semester 1 2024	Normal day	Camperdown/Darlington, Sydney	View

Session	MoA ?	Location	Outline ?
Semester 1 2020	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2021	Normal day	Remote	View
Semester 1 2022	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2022	Normal day	Remote	View
Semester 1 2023	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2023	Normal day	Remote	View

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.

Current students

Details

Enrolment rules

Learning outcomes

Media

Student links

About us

Connect

Current students

DATA3404: Scalable Data Management

2024 unit information

Unit details and rules

Managing faculty or University school:

Computer Science

Details

Enrolment rules

Learning outcomes

Unit availability

Modes of attendance (MoA)

Select units

Useful links

Media

Student links

About us

Connect