Migrating CATPAW Development to Azure
Portions of this post build on concepts introduced in Managing Azure.
CATPAW - Computer-Aided Thinking, Primarily about Writing
From the CATPAW home screen…
In many ways, CATPAW is an online book about writing style–a guide to the choices we make in writing that connect us to our readers.
Rather than setting out rules to follow, CATPAW will help you make informed choices in context. The site accomplishes that goal in three ways:
- It explains the choices writers face.
- It uses computational tools to help you examine your own writing, letting you see what choices you have already made and what you might want to do differently.
- It places these choices in the context of advice from other prominent guides to writing.
CATPAW is a Python Flask web application that employs a number of Python packages including nltk: the Natural Language Toolkit.
CATPAW was created by Professor Erik Simpson with initial programming assistance from Alina Guha and myself. Throughout its history, CATPAW development and collaboration has engaged a
git workflow with code updates posted to GitHub private repositories.
Initial Development and Deployment to Azure
The first CATPAW code repository is/was https://github.com/alinejg/catpaw. This repo was initially deployed to the web via Azure under a “trial” account owned by Alina. When the trial subscription ended we took steps to move the deployment from Azure to Reclaim Cloud to avoid accumulating fees on Alina’s Azure account.
README.md file from the aforementioned original project documents early development and the move to Reclaim Cloud. That original
README.md was exported to a file named
README-original.pdf that’s stored in our current development repo CATPAW-Azure below. The PDF is available for download in this post’s Attachments section for convenience.
CATPAW Deployment to Reclaim Cloud
Deployment to Reclaim Cloud provided us with very few options, and all of them incurred fees throughout the development lifecycle. Fortunately, the fees were not too significant and they were covered with budget and credits secured by the DLAC.
Our deployment to Reclaim Cloud also encountered two technical challenges:
Due to the nature of Flask and its built-in webserver, we had to maintain two different versions of the code, one for the local/development websever, and a second copy for any remote/deployed using a wsgi interface. A pair of
.shscripts were created to manually switch between versions. Specific differences between versions are documented in the aforementioned
Recently, the addition of new nltk elements and a pandas.DataFrame object introduced a configuration where the code would run locally, but the modified wsgi version would not successfully deploy to Reclaim Cloud.
While addressing technical problem #2, above, we discovered that returning to Azure might provide a more flexible and free, or very low-cost, deployment alternative to Reclaim Cloud, and new Azure VSCode extensions could be leveraged to simplify the move. Furthermore, these VSCode extensions provide nearly seemless integration with Azure App Service, and most importantly, there is no need to maintain separate local and wsgi versions of code when deploying to Azure. Yay!
A search of the web for “azure python flask” returned a host of promising articles and I choose to create a new CATPAW project repo, https://github.com/Digital-Grinnell/catpaw-azure, and to develop it using the guidance found in Deploying Flask web app on Microsoft Azure., and later in Quickstart: Deploy a Python (Django or Flask) web app to Azure App Service.
The original description of https://github.com/Digital-Grinnell/catpaw-azure was:
A restart of CATPAW work from 2022, this time destined for dev deployment in Azure App Service.
README.md file from the development branch of https://github.com/Digital-Grinnell/catpaw-azure explains some of the repositories’ early history and it can be downloaded here as
Pandas vs Polars
Also while addressing problem #2 (see above) we found that a successful local build using the Pandas dataframe class could not be successfully deployed to Reclaim Cloud, nor to Azure. The Azure error messages hinted at the platforms inability to “build” Pandas due to a missing
python.h file. Also, the build time in Azure with Pandas included was 10x the time when building the app without Pandas.
I tried 8 different deployment configurations in hopes of making Pandas work in Azure, but they all failed with the same error. So, rather than perpetuating failures I elected to look for an alternative to Pandas to determine if that was indeed the source of the error. It was.
Ultimately I choose to rewrite a small portion of the
app.py code to remove Pandas and replace it with “equivalent” code using the Polars DataFrame Library. Polars not only works, both locally and in Azure, but its associated build-time is one tenth what Pandas required.
At the time of this writing (2023-01-12T12:44:01-06:00) the
development branch of catpaw-azure is using Polars, while the
main branch still employs the broken Pandas logic. I expect a merge of these branches and adoption of Polars soon.
There may be more to come, but for now… that’s a wrap.