Trying EOS data again with different link that includes flags/qc tests.
To keep this data set consitent with the others, I’d rather only exclude “failed” data and keep “suspect” data.
Here it looks like removing “failed” data only partially removes the events we were concerned about so I may just have to remove them by hand.Some of events that look off to us are too large to be removed by the rolling median and then hourly median. Since I talked to Karina and Ryan, if I hand remove these events is that something I can just make note of in my methods? (ie. under the suggestion or after confering with data host certain events were removed?)
Comments from Ryan’s email: For December 2018:
that weird week happened when I was recovering from surgery I had on December 13th, 2018 and have no clue why. I would recommend cutting out that data. It was wintertime so maybe we had a power surge during from a bad storm or something…
For July 2019:
I had installed the EOS sonde on June 20th of 2019 fresh back from YSI. I did a calibration and all looked good. However, the pH data eventually drifted as you say, and I found when I pulled it up for service on August 6th, 2019 that our turbidity sensor was returned loose and had been deployed as such. This let water drift into the turbidity port and throw bad values. This sensor has a biowiper on it and if it is compromised can throw off other sensors with the extra drawdown of voltage as water is shorting the turbidity sensor, pH being one of the sensors with collateral damage.
Set up
rm(list=ls())
library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes)
library(cranlogs)
library(knitr)
library(openair)
Salinity
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_new.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
names(eos)[2] <- "datetime"
eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos%>%
ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
geom_point(alpha=0.5)+
labs(title="Salinity data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Plot with better view
eos%>%
ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
geom_point(alpha=0.5)+ ylim(0,35)+
labs(title="Salinity data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
## Warning: Removed 2 rows containing missing values (geom_point).
Remove data that failed QC test, 4=failed
eos<-eos[- grep("4", eos$sea_water_practical_salinity_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
geom_point(alpha=0.5)+ ylim(0,35)+
labs(title="Salinity data from EOS resesarch pier: data flagged fail removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Still didn’t get rid of the event we weren’t sure about at the end of 2018.
Remove suspect, 3=suspect
eos<-eos[- grep("3", eos$sea_water_practical_salinity_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
geom_point(alpha=0.5)+ ylim(0,35)+
labs(title="Salinity data from EOS resesarch pier: data flagged fail and suspect removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Gets rid of some of the event which makes me think I should just remove it all. Also gets rid of data I’m not sure should be.
pH
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_new.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
names(eos)[2] <- "datetime"
eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos%>%
ggplot(aes(x=date, y=sea_water_ph_reported_on_total_scale, color=sea_water_ph_reported_on_total_scale))+
geom_point(alpha=0.5)+
labs(title="pH data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove data that failed QC test, 4=failed
eos<-eos[- grep("4", eos$sea_water_ph_reported_on_total_scale_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_ph_reported_on_total_scale, color=sea_water_ph_reported_on_total_scale))+
geom_point(alpha=0.5)+
labs(title="pH data from EOS resesarch pier: data flagged fail removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove suspect data, 3=suspect
eos<-eos[- grep("3", eos$sea_water_ph_reported_on_total_scale_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_ph_reported_on_total_scale, color=sea_water_ph_reported_on_total_scale))+
geom_point(alpha=0.5)+
labs(title="pH data from EOS resesarch pier: data flagged fail and suspect removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Again, removes some of the event we were concerned about but not all.
Dissolved oxygen
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_new.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
names(eos)[2] <- "datetime"
eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos%>%
ggplot(aes(x=date, y=mass_concentration_of_oxygen_in_sea_water, color=mass_concentration_of_oxygen_in_sea_water))+
geom_point(alpha=0.5)+
labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove data that failed QC test, 4=failed
eos<-eos[- grep("4", eos$mass_concentration_of_oxygen_in_sea_water_qc_agg),]
eos%>%
ggplot(aes(x=date, y=mass_concentration_of_oxygen_in_sea_water, color=mass_concentration_of_oxygen_in_sea_water))+
geom_point(alpha=0.5)+
labs(title="Dissolved oxygen data from EOS resesarch pier: data flagged fail removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove suspect data, 3=suspect
eos<-eos[- grep("3", eos$mass_concentration_of_oxygen_in_sea_water_qc_agg),]
eos%>%
ggplot(aes(x=date, y=mass_concentration_of_oxygen_in_sea_water, color=mass_concentration_of_oxygen_in_sea_water))+
geom_point(alpha=0.5)+
labs(title="Dissolved oxygen data from EOS resesarch pier: data flagged fail and suspect removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Didn’t remove that many points and the area of concern is still there
Water Temperature
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_new.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
names(eos)[2] <- "datetime"
eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos%>%
ggplot(aes(x=date, y=sea_water_temperature, color=sea_water_temperature))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove data that failed QC test, 4=failed
eos<-eos[- grep("4", eos$sea_water_temperature_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_temperature, color=sea_water_temperature))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from EOS resesarch pier: data flagged fail removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Only removed some of the area of concern
Remove suspect data, 3= suspect
eos<-eos[- grep("3", eos$sea_water_temperature_qc_agg),]
eos%>%
ggplot(aes(x=date, y=sea_water_temperature, color=sea_water_temperature))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from EOS resesarch pier: data flagged fail and suspectremoved",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Removed most of the area of concern