Predicting Reservoir Storage Capacity Using Machine Learning

This project is an data visualization web app built on the Django framework used to analyze and predict trends in reservoir storage data. Machine Learning techniques are used to predict hindcasted reservoir storage values using reservoir and climate based time series data. Additionally, explore the real time values of surrounding reservoirs using interactive map visualizations and drop downs. Here is the Live Link and Github Repo.

print("This is Res Tool")

Problem Description

The efficient management of water resources in southern California is critical to the livelihood and health of its inhabitants. Southern California is a region often faced with prolonged seasons of drought. Water resource management is important to provide potable water for nearby communities, as well as irrigation to serve agricultural needs of local farmers. An integral tool to the success of water resource management are reservoirs.

Predictive analytics are employed by reservoir management to accurately forecast future weather events and the subsequent reservoir storage levels to ensure the needs of the community can be met. Machine learning models can be created and trained on historical data and future forecasted climate data to make predictions on reservoir storage levels. This project examines six different machine learning (ML) models, and their accuracy in predicting reservoir storage values for San Vicente Reservoir and El Capitan Reservoir in Southern California.

Built with:

Python - For server-side processing and backend functionality
PostGres - As database and RDBMS
Django - Web framework
SciKit Learn - For machine learning and reservoir predictions
Plotly - For graphing and charts
Leaflet JS - For interactive map functionality in Res-Map

Step 1 - Data Ingestion

Data for the models was captured from two different sources. Reservoir meta data, as well as historical timeseries data for storage value and surface elevation were obtained through the United States Geological Survey’s (USGS) API. Historical timeseries climate data was obtained through the Applied Climate Information Systems (ACIS).

The code below shows the process of accessing and cleaning the USGS API reservoir data:

def getData(state_abbrev):
    url = "https://waterservices.usgs.gov/nwis/dv/?format=rdb&stateCd={}&startDT=2014-01-01&endDT={}&parameterCd=00054&siteStatus=active".format(state_abbrev, dt.datetime.now().strftime("%Y-%m-%d"))
    df = pd.read_csv(url, sep='\t', comment='#', on_bad_lines = 'skip')
    df = df.drop(df.index[0])
    df = df.drop(columns=[df.columns[0], df.columns[4]])
    df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')
    df = df.dropna(subset=['datetime'])
    df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2], errors='coerce')
    df = df.dropna(subset=[df.columns[2]])
    return df

Step 2 - Model Construction

ML models attempt to predict the storage value of San Vicente Reservoir and El Capitan Reservoir in Southern California. Each model uses historical temperature, precipitation, and past reservoir storage levels from the ingested data. Data from January 1, 2010 to December 31, 2021 is used for the training data set while data from January 1, 2022 to October 31, 2023 is used for the test data set. The same date range was applied to both reservoirs. The test data spanning nearly two years was purposeful to increase the likelihood of extreme weather events, such as droughts or periods of high precipitation.

The model schema is outlined below:

Step 3 - Model Validation and Web App Deployment

Model predictions and actual values were plotted on top of each other using Plotly.express. This allowed for easy comparisons and interactive graphs. From there, I set out to develop a web app to display the findings of my research. The web app had three diffent components:

Res Predict

Select a machine learning model to predict hindcasted reservoir levels for San Vicente Reservoir and El Capitan Reservoir and observe the difference in model prediction accuracies. Models Used:

Neural Network
Gaussian Process
SVR
Decision Tree
Random Forest
Nearest Neighbor

Neural Network Model saw results of up to 99.8% accuracy!

Res Select

Select state and station with dynamically populated dropdowns to view the data for each location.

Res Map

Select locations from an interactive map to view the data for each station.

Conclusion and Takeaways

This project was excellent exposure for me to new web development frameworks in Django, a new (to me) RDBMS technology in PostGreSQL, as well as great application of data science principles and practices. The process of creating a full stack data science application gave me meaningful and relevant experience, and I had a blast doing it!