R for Water Professionals

The instructor has published 100% of this course.

Managing reliable water services requires not only a sufficient volume of water but also significant amounts of data. Water professionals continuously measure the flow and quality of the water and how customers perceive their service. This course teaches the basics of using the R language to solve water management problems.

This course includes 1 attempt.

Managing reliable water services requires not only a sufficient volume of water but also significant amounts of data. Water professionals continuously measure the flow and quality of the water and how customers perceive their service. Data and water are natural partners. Water utilities are awash, or even flooded with data. Data professionals use data pipelines and data lakes and cause data to flow from one place to another.

The term digital water utility has become a popular buzzword in the industry. A digital utility can only exist when the people that manage the water improve their skills. This workshop introduces participants to analysing data using the R language for statistical computing.

This course does not provide not an exhaustive introduction into data science programming but merely a teaser to inspire water professionals to ditch their spreadsheets and start writing code to create value from data.

The content of this course is available free of charge for any employee of organisations who are members of Water Research Australia, Intelligent Water Networks (IWN) or the Smart Water Networks forum (SWAN). Contact your representative to get free access.

This course opens with an introduction to the principles of data science and the R language. The following three sessions consist of realistic case studies where participants solve water management problems using R code.

Principles of Water Utility Data Science

This course is not only about the vocabulary and syntax of R, but also about producing good data science. This session introduces a framework for best practice in analysing data and sharing the results. This framework derives from the book Principles of Strategic Data Science. The three case studies each implement aspects of this framework.

Introduction to the R Language

The second session introduces the basic principles of the R language and applies these principles to measuring the flow in an open channel.

Case Study 1: Water Quality Regulations

In this first case study, participants apply their skills to laboratory testing data from an imaginary drinking water network. The case study revolves around checking the data for compliance with water quality regulations.

The Tidyverse

The Tidyverse is an extension of the R language that provides additional functionality to simplify manipulating, analysing and presenting data science. The fourth session delves into the basic principles of visualising data with the ggpot2 library, using data from the first case study.

Case Study 2: Understanding Customer Perception

The data for the second case study consists of the results of a survey of American consumers about their perception of tap water services. Participants use the Tidyverse to clean, transform and visualise this data.

Data Products

The fifth session focuses on the data science workflow as an iterative process to solve a data problem. This session also introduces R Markdown as a tool to report on the results of a data science project. In this session, students prepare a report to summarise the impact of proposed changes to water regulations on the data in the first case study.

Case Study 3: Analysing Water Consumption

In the last case study, participants use the analytical functionalities of the Tidyverse to analyse data from smart meters to find anomalies in water consumption.