Verifying data: Since there’s no QC test column with the EOS research pier data, I want to see if there is a reason for some of the suspicious looking data that is not being removed with the filters we’re using. Karina and I have reviewed the data together and there’s some events in the data we’re not sure about. For salinity, dissolved oxygen, and water temperature, the questionable data all occurs from December 16, 2018 - December 26,2018. Below I’m graphing the complete/raw data and will highlight the areas of concern as I go in more detail, but here is a synopsis of the areas of concern:

-Salinity: Lower salinity event from December 16, 2018 - December 26,2018.
-pH: Lower pH values in July 2019 and a little bit in August 2019. The lower values in August 2019 are removed but there are too many in July 2019 to be removed with our filters.
-Dissolved oxygen: Lower dissolved oxygen values from December 16, 2018 - December 26,2018, same exact dates as the lower salinity data we’re not sure about.
-Water temperature: Higher water temperature event from December 16, 2018 - December 26,2018.

Did anything happen in December 2018 and July 2019 that could explain these patterns? Do you think these are real data or is something is off here?

*Note that these are just working graphs and code where I’m trying to get a sense of the data

Set up

rm(list=ls())

library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes) 
library(cranlogs)
library(knitr)
library(openair)

Salinity

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))

eos<-subset(eos, select=c(date, datetime, salinity))

eos%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with better view

eos%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+ ylim(0,35)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2 rows containing missing values (geom_point).

We’re not sure if the low salinity event at the end of 2018 is real or if it is due to some sort of data collection error. Looks like it’s in Decemeber 2018.

Zooming in to area of concern

eos%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  ylim(0,35)+ 
  scale_x_date(limits = as.Date(c("2018-12-01", "2019-01-01")))+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 252475 rows containing missing values (geom_point).

Looks like the data we’re not sure about is between December 16, 2018-December 26,2018.

pH

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))

eos<-subset(eos, select=c(date, datetime, ph))

eos%>%
  ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="pH data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

The data we’re not sure about is in the summer of 2019.

Zooming in to area of concern

eos%>%
  ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  scale_x_date(limits = as.Date(c("2019-06-01", "2019-09-01")))+
  labs(title="pH data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 238170 rows containing missing values (geom_point).

The suspicious data is sprinkled throughout July 2019 and a little bit in August 2019. The data in August is removed through our filters but there’s remnants of the lower pH pocket in July 2019 even with our filters. It looks like the dominant band in July is correct but I’m not sure about those lower values.

Dissolved oxygen

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)


eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos<-subset(eos, select=c(date, datetime, odo))

eos%>%
  ggplot(aes(x=date, y=odo, color=odo))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with better view

eos%>%
  ggplot(aes(x=date, y=odo, color=odo))+
  geom_point(alpha=0.5)+ylim(0,15)+
  labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 1 rows containing missing values (geom_point).

Looks like this happens during the same period as the low salinity data we’re not sure about.

Zooming in to area of concern

eos%>%
  ggplot(aes(x=date, y=odo, color=odo))+
  geom_point(alpha=0.5)+ylim(0,15)+
  scale_x_date(limits = as.Date(c("2018-12-01", "2019-01-01")))+
    labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 252475 rows containing missing values (geom_point).

Same dates as the lower salinity data we’re not sure about, December 16,2018- December 26,2018.

Water Temperature

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos<-subset(eos, select=c(date, datetime, temperature))

eos%>%
  ggplot(aes(x=date, y=temperature, color=temperature))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Looks like the same period as the salinity and dissolved oxygen data in question.
Zooming in to area of concern:

eos%>%
  ggplot(aes(x=date, y=temperature, color=temperature))+
  geom_point(alpha=0.5)+
  scale_x_date(limits = as.Date(c("2018-12-01", "2019-01-01")))+
    labs(title="Water temperature data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 252475 rows containing missing values (geom_point).

Same Decemeber 16, 2018 - December 26, 2018 period.