AIC example

Goal of this markdown: Walk through an example of how to use AIC to rank models from my field work where I look at the impact of envirnmental factors. I use the same approach for my experiment so I won’t run that code here but I’ll talk about it a little bit incase that’s useful for you. At the end I’ll show you how I graph predicted values from the model from my phenology model list. For my data we’re using mixed linear models (explained below) but I believe this approach works for linear models as well, you would just remove the grouping term.

Mixed linear models vs Linear Models
- Mixed model is the best approach because for my field data since my data is “clustered/grouped” by site but the sites are random representatives of Central SFE (grouped = random). Code for random term: (1 | term) (Don’t include if you’re not doign mixed model approach). I also use it for my experimental data and group it by “trial run” to account for any variation between the dates. - NOTE: for transformed models, denote term with “I.” This proved to be critical when graphing predicted models later. For example, lmer(sqrt(no.fuc.q) ~ week_nr + I(week_nr^2) + (1|site), data=all).

Phenology in field work:
- For “time” variable I use number of weeks since the first survey (with the first survey week set at 0) which will allow me to convert it to a numneric variable and use quadratic terms as needed. This also allows the model to interpolate back to a defined “0”.

Steps:
- What are your research questions and hypotheses? This is key in being strategic in the models you chose to run. Don’t list every posible combination of your variables, think about ecological mechanisms and the literature you’ve read to determine what you’re really interested in. This also includes thinking about what interactions terms you’ll include or not. I’ll list mine models below.
- Set up data. What variables are you interested in? How should you best set up your data? For me, I have my environemtal data summarized with daily ranges and “extreme” events. For each field survey I look at the effects of 30 days of field data 60 days before (not looking at the 30 days leading up to the field survey). When running models, your variables need to be summarized on one line in your data frame because when you run the model it’ll be looking at column names. Take time to make sure your data is set up properly for your research questions.
- Read in and set up data. For me this included creating a new column with “week number.”
- Run AIC with your list of models - Read and interpret output - Examine top ranked model(s) closer

My list of models I am addressing research question #3:
Is variation in salinity, air temperature, or pH associated with changes in abundances or reproductive effort of F. distichus populations in SFE?

After looking at my phenology models, we are more interested in seeing how juvenile abundance changes due to environmental factors. Reviewing the literature, we decided we are interested in extreme events and daily ranges. “Juv” below refers to juvenile density; if I had multiple factors measuring juvenile abundance I would use the same approach multiple times with a different variable (juvenile density with this list of explanatory factors, juvenile x in a seperate list and ranking). Here is the list was came up with:

Juv ~ adult cover + daily salinity range + high air temp event + daily pH range + adult cover: daily air temp event
Juv ~ adult cover + low salinity event + high air temp event + high pH event + adult cover : high air temp event
Juv ~ adult cover + daily salinity range + high air temp event + adult cover : high air temp event 4. Juv ~ adult cover + low salinity event + high air temp event + adult cover : high air temp event
Juv ~ adult cover + high air temp event + adult cover : high air temp event
Juv ~ adult cover + daily salinity range + high air temp event
Juv ~ low salinity event + high pH event + high air temp event
Juv ~ daily salinity range + daily pH range + high air temp event
Juv ~ adult cover + high air temp event
Juv ~ daily salinity range + daily pH range
Juv ~ low salinity event + high pH event
Juv ~ low salinity event + high air temp event

Helpful links:
- lme4 package: https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
- mixed movels: https://m-clark.github.io/mixed-models-with-R/introduction.html -https://ourcodingclub.github.io/tutorials/mixed-models/ -http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-definition
-https://ademos.people.uic.edu/Chapter18.html -https://www.statology.org/aic-in-r/ Understanding the output: https://campus.datacamp.com/courses/hierarchical-and-mixed-effects-models-in-r/linear-mixed-effect-models?ex=7 -https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/subject.html#12692

Set up.

rm(list=ls())

#AIC
library(AICcmodavg)
library(plm) #for random factors
library(lme4)
library(car)

#Table 
library(sjPlot) 
library(sjmisc)
library(sjlabelled)
library(insight)
library(tibble)


#extra- I'm bad at keeping track of what packages I use
library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)

## Error in get(genname, envir = envir) : object 'testthat_print' not found

library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes) 
library(cranlogs)
library(knitr)
library(openair)
library(xts)
library(dplyr)

Read in data-
Envirnomental data 30 days leading up to survey day (excluding day of survey) and adult thalli density from the field survey one month before. Juvenile density is the mean per survey per field site.

#read in data
one.month<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environment_field/envi.field.all.one.month.lag.csv",
    header = TRUE
  )
one.month$date<-as.Date(one.month$date, format = c("%Y-%m-%d"))

Convert date to number of weeks of surveys. Starting at week 0 being the first survey. This will make it a contiuous variable and then I can add a quadratic term to it. This is different that “week number of the year” so I have to use a customized approach. I was able to find a similar question online: https://stackoverflow.com/questions/54406803/assigning-custom-week-number-in-r/54407091, https://stackoverflow.com/questions/19780415/lubridate-week-to-find-consecutive-week-number-for-multi-year-periods

# subtracts minimun week number, +0 sets what week you want to start it on
one.month$week_nr <- (interval(min(one.month$date), one.month$date) %/% weeks(1)) + 0

AIC

#list of models
m1 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)

m2 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + max.daily.sal.lt10 + max.daily.air.gt26 + max.daily.ph.gt8 + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)

m3 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26 + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)

m4 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + max.daily.sal.lt10 + max.daily.air.gt26 + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)

m5 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + max.daily.air.gt26 + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)

m6 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26, data=one.month)

m7 <- lm(no.small.fuc.q ~ max.daily.sal.lt10 + max.daily.ph.gt8 + max.daily.air.gt26, data=one.month)

m8 <- lm(no.small.fuc.q ~ daily.sal.range + daily.ph.range + max.daily.air.gt26, data=one.month)

m9 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + max.daily.air.gt26, data=one.month)

m10 <- lm(no.small.fuc.q ~ daily.sal.range + daily.ph.range, data=one.month)

m11 <- lm(no.small.fuc.q ~ max.daily.sal.lt10 + max.daily.ph.gt8, data=one.month)

m12 <- lm(no.small.fuc.q ~ max.daily.sal.lt10 + max.daily.air.gt26, data=one.month)

#define list of models
models <- list(m1,m2,m3,m4,m5,m6,m7,m8,m9,m10,m11,m12)

#specify model names- just keeping them the same since I have too many. Here you can change the names, in the same order as defined above, to something like "salinity", "temp", ect
mod.names <- c('m1','m2','m3','m4', 'm5','m6','m7','m8', 'm9','m10', 'm11','m12')

#calculate AIC of each model
aictab(cand.set = models, modnames = mod.names)

## 
## Model selection based on AICc:
## 
##     K   AICc Delta_AICc AICcWt Cum.Wt      LL
## m1  7 395.48       0.00   0.96   0.96 -188.93
## m2  7 401.78       6.30   0.04   1.00 -192.09
## m8  5 424.88      29.40   0.00   1.00 -206.61
## m10 4 425.32      29.84   0.00   1.00 -208.12
## m11 4 430.60      35.12   0.00   1.00 -210.76
## m7  5 433.06      37.58   0.00   1.00 -210.70
## m6  5 515.41     119.92   0.00   1.00 -252.06
## m3  6 517.94     122.46   0.00   1.00 -252.06
## m9  4 521.93     126.45   0.00   1.00 -256.55
## m5  5 524.24     128.75   0.00   1.00 -256.48
## m4  6 526.78     131.30   0.00   1.00 -256.48
## m12 4 569.63     174.15   0.00   1.00 -280.43

From the output I see model 1 is ranked the heightest (lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)). Going to take a closer look at that model.

m1 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)
Fitting and analyzing the top ranked model.

m1 <- lm(no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, data=one.month)
plot(m1)

m1

## 
## Call:
## lm(formula = no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + 
##     max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, 
##     data = one.month)
## 
## Coefficients:
##                           (Intercept)                     lag.no.large.fuc.q  
##                             58.753199                               1.000281  
##                       daily.sal.range                     max.daily.air.gt26  
##                             -4.504252                              -0.790638  
##                        daily.ph.range  lag.no.large.fuc.q:max.daily.air.gt26  
##                             -1.891923                               0.002399

summary(m1)

## 
## Call:
## lm(formula = no.small.fuc.q ~ lag.no.large.fuc.q + daily.sal.range + 
##     max.daily.air.gt26 + daily.ph.range + lag.no.large.fuc.q:max.daily.air.gt26, 
##     data = one.month)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.061 -24.344  -7.122  12.649 104.767 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                           58.753199  20.286896   2.896  0.00666 **
## lag.no.large.fuc.q                     1.000281   1.342969   0.745  0.46165   
## daily.sal.range                       -4.504252   1.774704  -2.538  0.01605 * 
## max.daily.air.gt26                    -0.790638   0.882614  -0.896  0.37685   
## daily.ph.range                        -1.891923   3.429397  -0.552  0.58489   
## lag.no.large.fuc.q:max.daily.air.gt26  0.002399   0.094643   0.025  0.97993   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.42 on 33 degrees of freedom
##   (22 observations deleted due to missingness)
## Multiple R-squared:  0.2047, Adjusted R-squared:  0.0842 
## F-statistic: 1.699 on 5 and 33 DF,  p-value: 0.1624

anova(m1)

## Analysis of Variance Table
## 
## Response: no.small.fuc.q
##                                       Df Sum Sq Mean Sq F value  Pr(>F)  
## lag.no.large.fuc.q                     1   1580  1580.1  1.4150 0.24271  
## daily.sal.range                        1   5706  5706.1  5.1099 0.03051 *
## max.daily.air.gt26                     1   1857  1856.8  1.6628 0.20620  
## daily.ph.range                         1    341   340.9  0.3053 0.58429  
## lag.no.large.fuc.q:max.daily.air.gt26  1      1     0.7  0.0006 0.97993  
## Residuals                             33  36850  1116.7                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

confint(m1)

##                                            2.5 %      97.5 %
## (Intercept)                           17.4791983 100.0271992
## lag.no.large.fuc.q                    -1.7320093   3.7325716
## daily.sal.range                       -8.1149154  -0.8935885
## max.daily.air.gt26                    -2.5863298   1.0050534
## daily.ph.range                        -8.8690826   5.0852374
## lag.no.large.fuc.q:max.daily.air.gt26 -0.1901534   0.1949505

Table

#simple table
tab_model(m1)

	no.small.fuc.q
Predictors	Estimates	CI	p
(Intercept)	58.75	17.48 – 100.03	0.007
lag.no.large.fuc.q	1.00	-1.73 – 3.73	0.462
daily.sal.range	-4.50	-8.11 – -0.89	0.016
max.daily.air.gt26	-0.79	-2.59 – 1.01	0.377
daily.ph.range	-1.89	-8.87 – 5.09	0.585
lag.no.large.fuc.q * max.daily.air.gt26	0.00	-0.19 – 0.19	0.980
Observations	39
R² / R² adjusted	0.205 / 0.084

#modified table
tab_model(m1,  #models to table
  show.ci = 0.95, #show 95% CI
  show.se = TRUE, #show standar error
  show.stat = FALSE,
  emph.p = TRUE, #bold "significant" p values
  df.method = "kr",
  p.style = "numeric_stars",
  title = "Linear model for small thalli density",
  linebreak = TRUE,
  dv.labels = c("small thalli density" #change name of column
      ),
  pred.labels = c( #change names of rows
    "(Intercept)", 
    "large thalli density",
    "daily salinity range",
    "number of days with a maximum daily air temperature greater than 26C",
    "daily pH range",
    "large thalli density : number of days with a maximum daily air temperature greater than 26C"
      ),
  CSS = css_theme("cells")
  )

Linear model for small thalli density
	small thalli density
Predictors	Estimates	std. Error	CI	p
(Intercept)	58.75 ^**	20.29	17.48 – 100.03	0.007
large thalli density	1.00	1.34	-1.73 – 3.73	0.462
daily salinity range	-4.50 ^*	1.77	-8.11 – -0.89	0.016
number of days with a maximum daily air temperature greater than 26C	-0.79	0.88	-2.59 – 1.01	0.377
daily pH range	-1.89	3.43	-8.87 – 5.09	0.585
large thalli density : number of days with a maximum daily air temperature greater than 26C	0.00	0.09	-0.19 – 0.19	0.980
Observations	39
R² / R² adjusted	0.205 / 0.084
p<0.05 p<0.01 * p<0.001

Graphing the predicted values of the models and the predicted values of the model per site

I don’t do this with my model list above/my environmental data but I thought this might be helpful for you. I’ll walk though an example with my phenology model.

2.4 Does small thalli density change over time? sqrt(no.small.fuc.q ~ week_nr + week_nr^2 +((1|site)
Added position dodge to the site trend lines because they were stacked (https://stackoverflow.com/questions/10866047/jitter-geom-line)

#packages, to use ggpredict
library(ggeffects)

#read in data
all<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environment_field/envi.field.all.csv",
    header = TRUE
  )
all$date<-as.Date(all$date, format = c("%Y-%m-%d"))

#add week number
all$week_nr <- (interval(min(all$date), all$date) %/% weeks(1)) + 0


#remove rows with NA 
no.na<-all[!is.na(all$no.small.fuc.q), ]

#model-The I is important here!
mlm2.4<-lmer(sqrt(no.small.fuc.q) ~ week_nr + I(week_nr^2) + (1|site), data=no.na)

## Warning: Some predictor variables are on very different scales: consider
## rescaling

# Extract the prediction data frame
pred.mm <- ggpredict(mlm2.4, terms = "week_nr [all]") 

#add predicted values from the model to the dataset
no.na$predlm = predict(mlm2.4)

#assign colors to models (needs work)
COLORS <- c(group="black", pc.cc = "#66C2A5", nd.eos ="#FC8D62",  
            by.rb = "#8DA0CB" , hs.fp = "#E78AC3")


#plot
ggplot() +
  geom_ribbon(data=pred.mm,aes(x = x,
      ymin = predicted - std.error,
      ymax = predicted + std.error
    ),
    fill = "lightgrey",
    alpha = 0.5
  ) +
  geom_line(data= pred.mm, aes(x = x, y = predicted), size=1.5) +
  geom_line(data= no.na, aes(x= week_nr, y = predlm, colour = site), size = 1) +
  geom_point(data = no.na, aes(x = week_nr, y = sqrt(no.small.fuc.q), colour = site)) +
  labs(x = "Weeks since first survey",
       y = "sqrt(Small Thalli Density/0.25m^2)",
       title= "sqrt(Small Thalli Density/0.25m^2) = 10.25 - 0.33(week) + 0.0038(week)^2") +
  theme_minimal()+
  scale_color_manual(
    name= "Predictive Models",
    values = COLORS,
    labels = c("Overall Model",
      "Paradise Cay",
               "Point Chauncey",
               "Brickyard Park",
               "Horseshoe Bay"
               )
  )+
  theme_bw(base_size=12)

AIC example

Chelsey Wegener

8/25/2021