Water Quality Project

Santa Barbara County Ocean Water Quality

Summary

This is a personal project which uses a Leaflet map to display ocean water test results provided by the Santa Barbara County Public Health Department’s Ocean Water Monitoring Program. The county tests beaches at least weekly and may issue warnings or close beaches if targeted bacterial counts exceed state standards. These targeted bacterial groups are indicators of fecal water contamination and elevated levels beyond state standards may present health hazards to ocean users. If a beach exceeds the standards, warnings will be posted or the beach will be closed then re-sampled at intervals until it is within acceptable standards and restrictions can be lifted.

Water quality reports are posted as PDFs to the Ocean Water Monitoring page at least once a week. However, initial reports often lack beach data and beaches may need to be re-sampled, this means that amended PDFs are often released throughout the week. Test result data are not otherwise accessible on the internet. I saw this as an opportunity to expand my Python, database management, and web design skills. I designed a workflow and web map to display these data in an easily usable and mobile friendly format. Report PDFs are downloaded, test result details are extracted, and data are inserted into a Postgres database. These data are then served to a Leaflet map which displays the most recent test results.

The source code for this project can be found in its GitHub project folder within the GitHub repo for this website

Project Description Contents

Summary
PDF Processing
Data Storage - Postgres
Storing PDF Reports
Serving up Data for Leaflet
Leaflet
Final Thoughts

PDF Processing

Here is an example of a standard PDF ocean water quality report with all beaches filled in:

Since reports are released weekly, with possible amendments throughout the week, and the PDF report download URL is static, the PDFs need to be downloaded and processed daily. Rather than figuring out how to setup a Linux based cron job, I found it easier to schedule daily processing using the Python APScheduler library. Since most reports are available by 8 am, the following code kicks off daily PDF downloading and processing at 16:30 UTC, 8:30 am PST:


from apscheduler.schedulers.background import BackgroundScheduler
from pytz import utc
sched = BackgroundScheduler(daemon=True, timezone=utc)
sched.add_job(pdfjob, trigger='cron', hour='16', minute='30')
sched.start()

To begin parsing a report PDF, first it needs to be downloaded from the County's website. This is a relatively simple task since the County’s test result PDF download URL is static and hasn't changed since I began this project. First the PDF is downloaded to a local drive with a meaningful name:


pdfName = f"Ocean_Water_Quality_Report_{datetime.now().strftime('%Y%m%d')}.pdf"
pdfLoc = pdfDest = os.path.join(app.root_path, 'static', 'documents', 'Water_Qual_PDFs', pdfName)
downloadURL = "http://countyofsb.org/uploadedFiles/phd/PROGRAMS/EHS/Ocean%20Water%20Weekly%20Results.pdf"
# Kick off script by downloading PDF
urlretrieve(downloadURL, pdfDest)

Extracting PDF contents was a bit of a challenge. The PDFs are machine generated, i.e. exported from a Word Document, so OCR isn’t needed, but I had a hard time finding a python library that extracted the contents correctly. After trying a couple libraries, I came across the pdfplumber library that not only extracted the contents well, but also had the ability to detect and pull out the data table content as a separate object.

This function extracts the contents of the PDF, passes them into a function to clean text (shown afterwards) and returns a dictionary containing the information:


def getPDFContents(pdfLoc):
	"""
	Extracts contents of PDF, including date and table information, and formats data by normalizing Unicode data and
	removing extra characters.

	:param pdfLoc: String. Location of local PDF file.
	:return: Dict:
	    ['text']: Original raw text of PDF
	    ['tab']: Raw table extracted from PDF
	    ['pdfDate']: Date from PDF converted to the following format: {full month name} {numeric day} {full year}
	    ['cleanedtext']: list of values from table, each row is a nested list containing normalized values
	"""
	pdfDict = {}
	# Open pdf and extract content
	with pdfplumber.open(pdfLoc) as pdf:
	    p1 = pdf.pages[0]
	    pdfDict['text'] = p1.extract_text()
	    raw_tab = p1.extract_tables()[0]
	    pdfDict['tab'] = raw_tab

	# Extract date from title and create a list of values
	pdfTitleList = pdfDict['text'].split("Sample Results for the Week of: ")[1].split(" ")
	# Pull the first 3 values from list, these are the date values
	dirtyTitle = f"{pdfTitleList[0]} {pdfTitleList[1]} {pdfTitleList[2]}"
	# Normalize and remove extra characters
	pdfDate = cleanText([dirtyTitle])[0]
	# Convert to string time
	pdfDict['pdfDate'] = datetime.strptime(pdfDate, '%B %d %Y')
	cleanedtext = []
	# Normalize the data within the raw table, each row is a nested list
	for beachdetails in raw_tab:
	    cleanedtext.append(cleanText(beachdetails))
	pdfDict['cleanedtext'] = cleanedtext
	return pdfDict

This function cleans the raw date and table contents so they can be parsed in a consistent matter. The normalization function was required to address Unicode issues I was encountering, such as regular and non-breaking spaces causing equality checks to fail.


	def cleanText(textList):
		"""
	  Normalizes the unicode text within the provided list, this is needed since the PDF conversion to text leads to
	  some Unicode characters being a combination of two Unicode characters, where we want a single value for ease of use.
	  Empty strings are changed to None type, < 0 is changed to 0, > characters are removed, and extra characters and
	  formatting are removed. This ensures that data can be cleanly inserted into Postgres with consistent values.

	  :param textList: Nested list of values to be cleaned.
	  :return: Nested list of cleaned values
	  """
	  text = []
		for item in textList:
	      if not item:
	          # Captures None types and skips
	          pass
	      elif item == '':
	          item = None
	      elif item == "<10":
	          item = "0"
	      elif ">" in item:
	          item = item.replace(">","")
	      elif item is not None:
	          item = unicodedata.normalize("NFKD", item).replace("\n", "").replace("‐", "-").replace(",", "")
	          if item == 'Results not available':
	              item = None
	      text.append(item)
		return text

Now that the PDF date and contents have been extracted they can be used to check if the PDF has been processed before, if the PDF is the first report the week, or if the PDF contains data amendments. This is first done by calculating the MD5 hash of the PDF's contents:


import hashlib
Hash text of pdf document, I believe encode is required for the hashing to work properly
hashedText = hashlib.md5(pdfDict['text'].encode()).hexdigest()

Next the MD5 hash and pdfDates are checked against the database to determine the status of the downloaded PDF:


def checkmd5(hash, pdfDate):
  """
  Checks if the downloaded PDF's MD5 hash is already in Postgres and returns result.

  :param hash: String
      New MD5 hash
  :param pdfDate: String
      New PDF Date
  :return: String
      "Exists" - Hash is already in Postgres
      "New" - Hash is not in Postgres and no other hashes exist for the PDF result week
      "Update" - Hash is not in Postgres but other hashes exist for the PDF result week
  """
  # Query Postgres with pdfDate of newly downloaded PDF
  session = Session()
  query = session.query(waterQualityMD5).filter(waterQualityMD5.pdfdate == pdfDate).all()
  hashList = []
  for i in query:
      hashList.append(i.md5)
  session.close()
  if hash in hashList:
      return "Exists"
  elif len(hashList) == 0:
      return "New"
  else:
      return "Update"

The PDF's status is used to determine what to do next, if the PDF's hash is already in the database then the script quits, if the PDF is new then the contents are extracted, and if the PDF is an update amendment then just the new records are extracted:


def handlePDFStatus(pdfStatus, pdfLoc, hashedText, pdfDict, pdfName):
	"""
	Handles status of newly downloaded PDF. If the PDF's MD5 hash already existed in Postgres then the local file is
	deleted and the script quits. If the PDF is new or an update then its uploaded to Google Drive and the contents
	are extracted. If the PDF contains re-sampled or data fill-ins, then just those results are extracted, if the
	PDF is new then all non-null values are extracted.

	Parameters
	----------
	pdfStatus: String of PDF Status
	pdfLoc: String of PDF location
	pdfDict: Dict of PDF contents
	pdfName: Name of PDF

	Returns
	-------
	Dictionary of formatted beach results, ready for Postgres insertion
	"""
	if pdfStatus == "Exists":
	    # PDF has been processed, remove local file
	    try:
	        os.remove(pdfLoc)
	        # Quit script, no further processing is needed
	        quit()
	else:
	    # PDF is new, it contains re-sampled or data fill-ins, upload file to Google Drive
	    try:
	        GoogleDriveUploadWaterQuality.addtoGDrive(pdfLoc, pdfName)
	    except Exception as e:
	        errorEmail.sendErrorEmail(script="addtoGDrive", exceptiontype=e.__class__.__name__, body=e)
	    # print("Finished with local PDF, removing it from system")
	    os.remove(pdfLoc)
	if checkResamp(pdfDict['cleanedtext']) == True:
	    # PDF contains re-sample results
	    # Create dictionary with re-sampled and data fill-ins
	    beachDict = genReSampleDict(pdfDict['cleanedtext'], pdfDict['pdfDate'])
	else:
	    # PDF doesn't contain re-sampled results but may contain data fill-ins
	    # Generate beach dictionary
	    beachDict = genDict(pdfDict['pdfDate'])
	    # Populate beach dictionary with results
	    beachDict = populateDict(pdfDict['cleanedtext'], beachDict, "No")
	    # If the new PDF contains updates but not re-sample data
	    if pdfStatus == "Update":
	        # Get the beaches with null values that are being updated
	        nullBeaches = DBQueriesWaterQuality.getNullBeaches(pdfDict['pdfDate'])
	        # Check if the key, beachname, is in the null beach list, if not delete it from the beach results dict
	        # Delete any keys with None records for water quality, even if they were already null, its possible
	        # that a updated PDF will not fill in all beaches
	        for beachKey in list(beachDict.keys()):
	            if (beachKey not in nullBeaches) or (beachDict[beachKey]['Total Coliform Results (MPN*)'] is None):
	                del beachDict[beachKey]
	# Get the md5 hash for the new pdf
	# Mutate beachDict to replace empty strings with None values
	makeNull(beachDict)
	# Insert MD5 hash into Postgres
	hashid = DBQueriesWaterQuality.insmd5(hashedText, pdfDict['pdfDate'], pdfName)
	# Insert records into Postgres, using the beachDict
	DBQueriesWaterQuality.insertWaterQual(beachDict, hashid)
	return beachDict

In this example, you can see a beach with an exceedance and two beaches missing results. The second image shows the PDF report from later in the week with re-sampling and data fill amendments.

The following code checks if any records are re-sampled records:


def checkResamp(tab):
	for sub_list in tab:
	    if "sample" in sub_list[0]:
	        return True

If the PDF contains any re-sampled results then a dictionary is generated with just those results, including any data fill-ins, using the following functions:


def genReSampleDict(tab, pdfDate):
  """
  Generates nested dictionary of re-sampled and fill-in records, with beach names and column names as keys.

  :param tab: Nested list of cleaned table records
  :param pdfDate: String of PDF Date
  :return:
  Nested dictionary containing re-sampled records and filled-in records, with beach names as keys
  """
  resampBeaches = []
  combinedBeaches = []
  resampTab = [tab[0]]
  newRecTab = [tab[0]]
  # Get list of null beaches
  nullBeaches = DBQueriesWaterQuality.getNullBeaches(pdfDate)
  # Iterate over all records in the table
  for row in range(1, len(tab)):
      # Check each beach name, index 0 in the nested list, to see if it contains "sample", meaning it was resampled
      if "sample" in tab[row][0]:
          resampRow = tab[row]
          resampRow[0] = resampRow[0].split(' Re')[0].rstrip(" ")
          # Add to resample beach list
          resampBeaches.append(resampRow[0])
          # Add to resample table
          for item in resampRow[1:]:
              if " " in item:
                  resampRow[resampRow.index(item)] = item.split(" ")[0]
          resampTab.append(resampRow)
      elif tab[row][0] in nullBeaches and tab[row][1] is not None:
          # Add beach name to the combined beaches list
          combinedBeaches.append(tab[row][0])
          # Add table row to the new records list
          newRecTab.append(tab[row])
  # Combine the beach names
  combinedBeaches = resampBeaches + combinedBeaches
  # Use the beach names to generate a template dictionary
  combinedDict = genDict(combinedBeaches)
  # Populate the dictionary with the re-sample data
  combinedDict = populateDict(resampTab, combinedDict, "Yes")
  # Populate the dictionary with the new record data
  combinedDict = populateDict(newRecTab, combinedDict, "No")
  return combinedDict

The following queries Postgres for null beaches from the PDF report date such that they can be filled in:


	def getNullBeaches(pdfDate):
    """
    Returns list of beaches with null values for the given PDF test week. Only called when a update/re-sample PDF is
    downloaded.

    :param pdfDate: String
        Date of new weekly PDF results
    :return: List[Strings,]
        Names of beaches with null test results
    """
    session = Session()
    query = session.query(waterQuality) \
        .join(waterQualityMD5) \
        .join(beaches) \
        .filter(waterQualityMD5.pdfdate == pdfDate) \
        .filter(or_(waterQuality.FecColi == None, waterQuality.Entero == None, waterQuality.TotColi == None)) \
        .all()
    nullbeaches = []
    for i in query:
        nullbeaches.append(i.beach_rel.BeachName)
		session.close()
		return nullbeaches

The following function generates the dictionary structure that will be populated with values:


def genDict(pdfDate):
	"""
	Generate a nested dictionary with beach names as keys at the upper level, and columns as keys at the
	nested level, values are set to '', except for the pdf date, so they can be filled in later.

	:param pdfDate: String of PDF date
	:return:
	Nested dict structured with with keys and empty values.
	"""
	# Beaches to be included in dictionary
	beachList = ['Carpinteria State Beach', 'Summerland Beach', 'Hammond\'s', 'Butterfly Beach',
	             'East Beach @ Sycamore Creek',
	             'East Beach @ Mission Creek', 'Leadbetter Beach', 'Arroyo Burro Beach', 'Hope Ranch Beach',
	             'Goleta Beach',
	             'Sands @ Coal Oil Point', 'El Capitan State Beach', 'Refugio State Beach', 'Guadalupe Dunes',
	             'Jalama Beach',
	             'Gaviota State Beach']
	# Beaches with their foreign key values
	beachFK = {'Carpinteria State Beach': 1, 'Summerland Beach': 2, 'Hammond\'s': 3, 'Butterfly Beach': 4,
	           'East Beach @ Sycamore Creek': 5, 'East Beach @ Mission Creek': 6, 'Leadbetter Beach': 7,
	           'Arroyo Burro Beach': 8, 'Hope Ranch Beach': 9, 'Goleta Beach': 10,
	           'Sands @ Coal Oil Point': 11, 'El Capitan State Beach': 12, 'Refugio State Beach': 13,
	           'Guadalupe Dunes': 14,
	           'Jalama Beach': 15,
	           'Gaviota State Beach': 16}
	# Table columns, nested dict keys
	col = ['Total Coliform Results (MPN*)', 'Total Coliform State Health Standard (MPN*)',
	       "Fecal Coliform Results (MPN*)", 'Fecal Coliform State Health Standard (MPN*)',
	       'Enterococcus Results (MPN*)',
	       'Enterococcus State Health Standard (MPN*)', 'Exceeds FC:TC ratio standard **', 'Beach Status', 'fk']
	# Build dict structure, inner dict values are empty strings
	beachDict = {}
	for i in beachList:
	    beachDict[i] = {}
	    for c in col:
	        beachDict[i][c] = ''
	    beachDict[i]['Date'] = pdfDate
	    beachDict[i]['fk'] = beachFK[i]
	    beachDict[i]['resample'] = ''
	return beachDict

Now the beach results dictionary is populated with values:


def populateDict(tab, beachDict, resample):
  """
  Populates test results dictionary structure with test result values for each beach.

  Parameters
  ----------
  tab: Nested list with cleaned beach results
  beachDict: Dictionary with structure but empty values, will be mutated.
  resample: String of re-resample status

  Returns
  -------
  Mutates and returns input beachDict with beach test results.
  """
  col = ['Total Coliform Results (MPN*)', 'Total Coliform State Health Standard (MPN*)',
         "Fecal Coliform Results (MPN*)", 'Fecal Coliform State Health Standard (MPN*)',
         'Enterococcus Results (MPN*)',
         'Enterococcus State Health Standard (MPN*)', 'Exceeds FC:TC ratio standard **', 'Beach Status', 'fk']

  # Iterate over table skipping row one, which is column names, and use the row index number
  for row in range(1, len(tab)):
      # For every row in the table, iterate over the columns, ignoring the first column (beach name) since this is the key value. Use the column index to call on the column names list, which acts as a lookup for the dictionary key value (column name) to be added to the 2nd level dictionary
      for i in range(1, (len(tab[row]))):
          # col[i-1] is needed since the loop is starting at index 1 to avoid iterating over the beach name in the original list (table), this index is needed to grab the proper column name(key) starting at index 0 so its decreased by 1 to maintain proper index location for filling in data
          if tab[row][i] is not None:
              beachDict[tab[row][0]][col[i - 1]] = tab[row][i].rstrip(" ")
          else:
              beachDict[tab[row][0]][col[i - 1]] = None
          beachDict[tab[row][0]]['resample'] = resample
  return beachDict

Finally, since its possible that the first report of the week will be missed or not processed properly, resulting in a re-sampled PDF being the first report of the week processed, any beaches missing data in the final results dictionary need to be removed:


def makeNull(beachDict):
  for i in list(beachDict.keys()):
      if not beachDict[i]["Total Coliform Results (MPN*)"]:
          beachDict.pop(i, None)

Data Storage - Postgres

Now that the data are extracted and processed, they need to be stored. I had already built a Postgres/PostGIS database on AWS RDS for my Mobile Livetracker so I expanded it to include these data. I did not leverage database relationships in my previous project, so I took the time to build multiple tables with foreign key relationships to familiarize myself with this essential aspect of database design.

Here is a quick visualization of the simple schema:

First the new MD5 hash value and PDF Dates are inserted into Postgres, water quality result records will use these entries as foreign keys:


def insmd5(MD5, pdfDate, pdfName):
  """
  Add water quality md5 and other information to postgres database. After committing, call on the primary key, id,
  to get the persisted, auto-incremented, id. The record must be committed before this value is assigned.
  :param MD5: String of MD5 hash.
  :param pdfDate: String of pdfDate.
  :param pdfName: String of PDF name, without file location

  :return:
  Int, primary key of new MD5 entry
  """
  session = Session()
  newrec = waterQualityMD5(md5=MD5, pdfdate=pdfDate, pdfName=pdfName, insdate=datetime.now())
  session.add(newrec)
  session.commit()
  newId = newrec.id
  session.close()
  return newId

Finally, insert the water quality results for each beach into Postgres:


def insertWaterQual(beachDict, md5_fk):
  """
  Inserts water quality results into water quality database table with md5 foreign key relationship.

  Parameters
  ----------
  beachDict: Dictionary. Dictionary containing values to be inserted into database.
  md5_fk: String. Foreign key from md5 table.

  Returns
  -------
  Print statement.
  """
  session = Session()
  inslist = []
  # Iterate over beaches in dictionary creating waterQuality objects for each beach key
  for key in beachDict.keys():
      inslist.append(
          waterQuality(beach_id=beachDict[key]['fk'], TotColi=beachDict[key]['Total Coliform Results (MPN*)'],
                       FecColi=beachDict[key]["Fecal Coliform Results (MPN*)"],
                       Entero=beachDict[key]['Enterococcus Results (MPN*)'],
                       ExceedsRatio=beachDict[key]['Exceeds FC:TC ratio standard **'],
                       BeachStatus=beachDict[key]['Beach Status'], resample=beachDict[key]['resample'],
                       md5_id=int(md5_fk)))
  # Add list of objects to session
  session.add_all(inslist)
  session.commit()
  session.close()

Here are the SQLAlchemy class models for the database, with relationship methods:


class waterQualityMD5(Base):
  __tablename__ = 'water_qual_md5'

  id = Column(Integer, primary_key=True)
  pdfdate = Column(Date)
  insdate = Column(Date)
  md5 = Column(String)
  pdfName = Column(String)

class waterQuality(Base):
  __tablename__ = "Water_Quality"

  id = Column(Integer, primary_key=True)
  TotColi = Column(Integer)
  FecColi = Column(Integer)
  Entero = Column(Integer)
  ExceedsRatio = Column(String)
  BeachStatus = Column(String)
  beach_id = Column(Integer, ForeignKey("Beaches.id"))
  md5_id = Column(Integer, ForeignKey("water_qual_md5.id"))
  resample = Column(String)

  beach_rel = relationship(beaches, backref="Water_Quality")
  hash_rel = relationship(waterQualityMD5, backref="Water_Quality")

Storing PDF Reports

Since the server is processing the PDFs, I needed an easy way to store and review PDFs that didn't involve connecting directly to the server through SSH. During development and testing I was storing PDFs in a Google Drive File Stream folder. I decided to continue storing PDFs in this folder from the server using the Google Drive API.

The Python library PyDrive simplified this process greatly. To authenticate the application and grant access to my Google Drive account using OAuth2.0, I followed the PyDrive guide here on a localhost version of the application. However, instead of using the local web server method, I used command line authentication. I did this in case I ever need to re-authenticate my application on the web server instead of on localhost. I am not sure if that will ever be required since it appears that authentication and granting of access only needs to occur once, even if its on localhost, as long as the correct settings are used and the credentials file/details are accessible by the application.

Code to authenticate with Google Drive API, establish connection to API, and to upload a PDF to a specific folder using its ID:


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
# Change default location for client_secrets.json
GoogleAuth.DEFAULT_SETTINGS['client_config_file'] = os.path.join(app.root_path, 'credentials.json')
# Create authenticated GoogleDrive instance using settings from the setting.yaml file to auto-authenticate with saved
# credentials
gauth = GoogleAuth(settings_file=os.path.join(app.root_path, 'settings.yaml'))
# Establish connection with Google Drive API
drive = GoogleDrive(gauth)
# Create a new GoogleDriveFile instance with a PDF mimetype in the water quality PDF folder using the parent folder's ID
newfile = drive.CreateFile({"title": pdfname,
                            'mimeType': 'application/pdf',
                            'parents': [{'id': "1GRunRWB7SKmH3I0wWbtyJ_UOCDiHGAxO"}]})
# Read file and set the content of the new file instance
newfile.SetContentFile(pdfloc)
# Upload the file to Google Drive
newfile.Upload()

To enable auto-authentication on a remote machine I used the setting.yaml remote method with the following settings:


client_config_backend: settings
client_config:
  client_id: [ID]
  client_secret: [Secret]
save_credentials: True
save_credentials_backend: file
save_credentials_file: application/credentials.json

get_refresh_token: True

oauth_scope:
  - https://www.googleapis.com/auth/drive

Storing the files in my Google Drive account is primarily for QAQCing any issues with the application and to retain records since the County doesn't provide historic reports on its website. These PDFs are not publicly available, however in the future I would like to expand this web application and work with AWS S3 Buckets to serve out these historic reports as downloadable files.

Serving up Data for Leaflet

Now it’s time to query that data I worked so hard to get into the database. For simplicity, I decided to serve out just the most recent results for each beach. Flask templates and expressions are used to pass the data into the HTML document to be accessed by JavaScript and Leaflet. This process provided me with major two hurdles, querying and accessing data from related records and getting the data into proper GeoJSON format for Leaflet. I had a hard time finding solid examples of how to setup SQLAlchemy models for foreign key relationships and how to query and extract information from them. Fortunately, this post contained example code on how to query the newest record per beach based on data from related tables.

Python function to get the most recent water quality results per beach:


records = session.query(waterQuality, waterQualityMD5, beaches, \
		sqlfunc.ST_GeometryType(beaches.geom), sqlfunc.st_x(beaches.geom), \
		sqlfunc.st_y(beaches.geom)) \
    .join(waterQualityMD5) \
    .join(beaches) \
    .distinct(waterQuality.beach_id)\
    .order_by(waterQuality.beach_id,  waterQualityMD5.insdate.desc()).all()

Working with GeoJSON data was difficult, as I wanted to avoid manually formatting the structure of the data. While I had a method developed for my Livetracker, I wanted a more direct method for generating feature collections. Fortunately the GeoJSON library worked perfectly and had solid example code. Working with the well-known binary (WKB) representation of coordinates, provided when using select queries against PostgreSQL/PostGIS, was also challenging. However, the Geo-Alchemy2 library contains PostGIS query functions to query coordinates directly, avoiding the need to convert text from binary.

Python function to generate a Feature Collection result:


resultDict = {}
for i, item in enumerate(records):
	beachname = (item.waterQuality.beach_rel.BeachName)
	resultDict[beachname] = {}
	resultDict[beachname]['FecColi'] = item.waterQuality.FecColi
	resultDict[beachname]['TotColi'] = item.waterQuality.TotColi
	resultDict[beachname]['Entero'] = item.waterQuality.Entero
	resultDict[beachname]['ExceedsRatio'] = item.waterQuality.ExceedsRatio
	resultDict[beachname]['BeachStatus'] = (item.waterQuality.BeachStatus).rstrip()
	resultDict[beachname]['resample'] = (item.waterQuality.resample).rstrip()
	resultDict[beachname]['insDate'] = (item.waterQuality.hash_rel.insdate).strftime("%Y-%m-%d")
	resultDict[beachname]['pdfDate'] = (item.waterQuality.hash_rel.pdfdate).strftime("%Y-%m-%d")
	resultDict[beachname]['GeomType'] = (records[i][-3]).split("ST_")[1]
	resultDict[beachname]['Lon'] = round(records[i][-2], 5)
	resultDict[beachname]['Lat'] = round(records[i][-1], 5)
	resultDict[beachname]['Name'] = (item.waterQuality.beach_rel.BeachName).rstrip()
featList = []
for key in resultDict.keys():
		# Point takes items as long, lat. Point must have (())
		beachPoint = Point((resultDict[key]['Lon'], resultDict[key]['Lat']))
		feature = Feature(geometry=beachPoint, properties=resultDict[key])
		featList.append(feature)

featCollect = FeatureCollection(featList)

Leaflet

Finally, it’s time to display the data with a responsive web map. Loading and displaying the data in Leaflet with a legend and markers is relatively easy. The beach open status property is used to determine which icon to display for each beach and up-to-date beach details are available as pop-ups. To facilitate navigating through the beaches dataset, I used the Leaflet Search plugin to enable auto-completion searching of beaches by name. I also included a modal that opens on page load, and is openable with a button, so users can read about the map and data limitations. The final step was to make the map mobile friendly. A viewport tag to size content down on mobile screens allows all content and pop-up information to be clearly visible and not in conflict with other map elements. JavaScript event listeners further reduce clutter by closing pop-ups and windows on user actions.

Final Thoughts

This was a fun project that allowed me to expand and refine many of my coding and web development skills while generating a product that’s useful for beach-goers. My next goal for this map is to add expandable graphs to the beach pop-up windows that show the histories of the respective test result along with the ability to download historic report PDFs served from a S3 bucket.