Augmented Dynamic Adaptive Model

This vignette explains briefly how to use the function adam() and the related auto.adam() in smooth package. It does not aim at covering all aspects of the function, but focuses on the main ones.

ADAM is Augmented Dynamic Adaptive Model. It is a model that underlies ETS, ARIMA and regression, connecting them in a unified framework. The underlying model for ADAM is a Single Source of Error state space model, which is explained in detail separately in an online textbook.

The main philosophy of adam() function is to be agnostic of the provided data. This means that it will work with ts, msts, zoo, xts, data.frame, numeric and other classes of data. The specification of seasonality in the model is done using a separate parameter lags, so you are not obliged to transform the existing data to something specific, and can use it as is. If you provide a matrix, or a data.frame, or a data.table, or any other multivariate structure, then the function will use the first column for the response variable and the others for the explanatory ones. One thing that is currently assumed in the function is that the data is measured at a regular frequency. If this is not the case, you will need to introduce missing values manually.

In order to run the experiments in this vignette, we need to load the following packages:

require(greybox)
require(smooth)

ADAM ETS

First and foremost, ADAM implements ETS model, although in a more flexible way than (Hyndman et al. 2008): it supports different distributions for the error term, which are regulated via distribution parameter. By default, the additive error model relies on Normal distribution, while the multiplicative error one assumes Inverse Gaussian. If you want to reproduce the classical ETS, you would need to specify distribution="dnorm". Here is an example of ADAM ETS(MMM) with Normal distribution on AirPassengers data:

testModel <- adam(AirPassengers, "MMM", lags=c(1,12), distribution="dnorm",
                  h=12, holdout=TRUE)
summary(testModel)
#> 
#> Model estimated using adam() function: ETS(MMM)
#> Response variable: AirPassengers
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 477.8814
#> Coefficients:
#>             Estimate Std. Error Lower 2.5% Upper 97.5%  
#> alpha         0.3212     0.0458     0.2304      0.4117 *
#> beta          0.0000     0.0074     0.0000      0.0146  
#> gamma         0.6202     0.0834     0.4551      0.6788 *
#> level       106.3827     3.1064   100.2296    112.5209 *
#> trend         1.0103     0.0012     1.0078      1.0127 *
#> seasonal_1    0.9145     0.0267     0.8649      0.9829 *
#> seasonal_2    0.9730     0.0289     0.9234      1.0415 *
#> seasonal_3    1.0739     0.0320     1.0243      1.1424 *
#> seasonal_4    1.0370     0.0304     0.9875      1.1055 *
#> seasonal_5    0.9617     0.0286     0.9121      1.0302 *
#> seasonal_6    1.0795     0.0319     1.0299      1.1479 *
#> seasonal_7    1.1782     0.0346     1.1286      1.2467 *
#> seasonal_8    1.1520     0.0332     1.1024      1.2204 *
#> seasonal_9    1.0652     0.0316     1.0156      1.1336 *
#> seasonal_10   0.9080     0.0262     0.8584      0.9764 *
#> seasonal_11   0.8088     0.0250     0.7593      0.8773 *
#> 
#> Error standard deviation: 0.0401
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  989.7627  995.1311 1038.7703 1051.8768
plot(forecast(testModel,h=12,interval="prediction"))

You might notice that the summary contains more than what is reported by other smooth functions. This one also produces standard errors for the estimated parameters based on Fisher Information calculation. Note that this is computationally expensive, so if you have a model with more than 30 variables, the calculation of standard errors might take plenty of time. As for the default print() method, it will produce a shorter summary from the model, without the standard errors (similar to what es() does):

testModel
#> Time elapsed: 0.15 seconds
#> Model estimated using adam() function: ETS(MMM)
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 477.8814
#> Persistence vector g:
#>  alpha   beta  gamma 
#> 0.3212 0.0000 0.6202 
#> 
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  989.7627  995.1311 1038.7703 1051.8768 
#> 
#> Forecast errors:
#> ME: -19.008; MAE: 19.786; RMSE: 24.57
#> sCE: -86.896%; Asymmetry: -94.2%; sMAE: 7.538%; sMSE: 0.876%
#> MASE: 0.822; RMSSE: 0.784; rMAE: 0.26; rRMSE: 0.239

Also, note that the prediction interval in case of multiplicative error models are approximate. It is advisable to use simulations instead (which is slower, but more accurate):

plot(forecast(testModel,h=18,interval="simulated"))

If you want to do the residuals diagnostics, then it is recommended to use plot function, something like this (you can select, which of the plots to produce):

par(mfcol=c(3,4))
plot(testModel,which=c(1:11))
par(mfcol=c(1,1))
plot(testModel,which=12)

By default ADAM will estimate models via maximising likelihood function. But there is also a parameter loss, which allows selecting from a list of already implemented loss functions (again, see documentation for adam() for the full list) or using a function written by a user. Here is how to do the latter on the example of BJsales:

lossFunction <- function(actual, fitted, B){
  return(sum(abs(actual-fitted)^3))
}
testModel <- adam(BJsales, "AAN", silent=FALSE, loss=lossFunction,
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.11 seconds
#> Model estimated using adam() function: ETS(AAN)
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: custom; Loss function value: 599.2241
#> Persistence vector g:
#>  alpha   beta 
#> 1.0000 0.2269 
#> 
#> Sample size: 138
#> Number of estimated parameters: 4
#> Number of degrees of freedom: 134
#> Information criteria are unavailable for the chosen loss & distribution.
#> 
#> Forecast errors:
#> ME: 3.015; MAE: 3.129; RMSE: 3.866
#> sCE: 15.918%; Asymmetry: 91.7%; sMAE: 1.376%; sMSE: 0.029%
#> MASE: 2.626; RMSSE: 2.52; rMAE: 1.009; rRMSE: 1.009

Note that you need to have parameters actual, fitted and B in the function, which correspond to the vector of actual values, vector of fitted values on each iteration and a vector of the optimised parameters.

loss and distribution parameters are independent, so in the example above, we have assumed that the error term follows Normal distribution, but we have estimated its parameters using a non-conventional loss because we can. Some of distributions assume that there is an additional parameter, which can either be estimated or provided by user. These include Asymmetric Laplace (distribution="dalaplace") with alpha, Generalised Normal and Log-Generalised normal (distribution=c("gnorm","dlgnorm")) with shape and Student’s T (distribution="dt") with nu:

testModel <- adam(BJsales, "MMN", silent=FALSE, distribution="dgnorm", shape=3,
                  h=12, holdout=TRUE)

The model selection in ADAM ETS relies on information criteria and works correctly only for the loss="likelihood". There are several options, how to select the model, see them in the description of the function: ?adam(). The default one uses branch-and-bound algorithm, similar to the one used in es(), but only considers additive trend models (the multiplicative trend ones are less stable and need more attention from a forecaster):

testModel <- adam(AirPassengers, "ZXZ", lags=c(1,12), silent=FALSE,
                  h=12, holdout=TRUE)
#> Forming the pool of models based on... ANN , ANA , MNM , MAM , Estimation progress:    71 %86 %100 %... Done!
testModel
#> Time elapsed: 0.59 seconds
#> Model estimated using adam() function: ETS(MAM)
#> With optimal initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 476.1101
#> Persistence vector g:
#>  alpha   beta  gamma 
#> 0.5266 0.0101 0.3637 
#> 
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  986.2201  991.5886 1035.2278 1048.3342 
#> 
#> Forecast errors:
#> ME: -12.98; MAE: 15.258; RMSE: 21.064
#> sCE: -59.34%; Asymmetry: -88%; sMAE: 5.813%; sMSE: 0.644%
#> MASE: 0.634; RMSSE: 0.672; rMAE: 0.201; rRMSE: 0.205

Note that the function produces point forecasts if h>0, but it won’t generate prediction interval. This is why you need to use forecast() method (as shown in the first example in this vignette).

Similarly to es(), function supports combination of models, but it saves all the tested models in the output for a potential reuse. Here how it works:

testModel <- adam(AirPassengers, "CXC", lags=c(1,12),
                  h=12, holdout=TRUE)
testForecast <- forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95))
testForecast
#>          Point forecast Lower bound (5%) Lower bound (2.5%) Upper bound (95%)
#> Jan 1960       423.1287         397.5694           392.8295          449.3541
#> Feb 1960       403.3055         373.5007           368.0153          434.0662
#> Mar 1960       475.9436         434.6678           427.1265          518.7821
#> Apr 1960       467.5378         424.9935           417.2396          511.7755
#> May 1960       489.4768         443.5320           435.1722          537.3105
#> Jun 1960       562.7037         507.8616           497.9037          619.8908
#> Jul 1960       633.4825         569.6947           558.1344          700.0925
#> Aug 1960       626.0549         561.7787           550.1433          693.2334
#> Sep 1960       511.7904         458.4733           448.8303          567.5521
#> Oct 1960       447.9908         400.2813           391.6643          497.9391
#> Nov 1960       393.3073         350.6224           342.9222          438.0358
#> Dec 1960       437.3567         388.5920           379.8105          488.5234
#> Jan 1961       457.3498         401.6926           391.7306          516.0139
#> Feb 1961       435.7025         379.1365           369.0615          495.5412
#> Mar 1961       513.9186         442.6689           430.0466          589.5889
#> Apr 1961       504.5923         432.4348           419.6858          581.3776
#> May 1961       528.0166         452.5439           439.2090          608.3280
#> Jun 1961       606.7212         519.2780           503.8396          699.8217
#>          Upper bound (97.5%)
#> Jan 1960            454.5376
#> Feb 1960            440.1882
#> Mar 1960            527.3641
#> Apr 1960            520.6571
#> May 1960            546.9282
#> Jun 1960            631.4102
#> Jul 1960            713.5321
#> Aug 1960            706.8014
#> Sep 1960            578.8230
#> Oct 1960            508.0469
#> Nov 1960            447.0967
#> Dec 1960            498.9043
#> Jan 1961            527.9778
#> Feb 1961            507.7952
#> Mar 1961            605.1545
#> Apr 1961            597.2075
#> May 1961            624.8846
#> Jun 1961            719.0266
plot(testForecast)

Yes, now we support vectors for the levels in case you want to produce several. In fact, we also support side for prediction interval, so you can extract specific quantiles without a hustle:

forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95,0.99), side="upper")
#>          Point forecast Upper bound (90%) Upper bound (95%) Upper bound (99%)
#> Jan 1960       423.1287          443.4268          449.3541          460.6141
#> Feb 1960       403.3055          427.0782          434.0662          447.3780
#> Mar 1960       475.9436          509.0030          518.7821          537.4598
#> Apr 1960       467.5378          501.6606          511.7755          531.1112
#> May 1960       489.4768          526.3617          537.3105          558.2529
#> Jun 1960       562.7037          606.7832          619.8908          644.9806
#> Jul 1960       633.4825          684.8065          700.0925          729.3714
#> Aug 1960       626.0549          677.8054          693.2334          722.7961
#> Sep 1960       511.7904          554.7388          567.5521          592.1123
#> Oct 1960       447.9908          486.4514          497.9391          519.9685
#> Nov 1960       393.3073          427.7407          438.0358          457.7865
#> Dec 1960       437.3567          476.7332          488.5234          511.1560
#> Jan 1961       457.3498          502.4440          516.0139          542.1165
#> Feb 1961       435.7025          481.6571          495.5412          522.2920
#> Mar 1961       513.9186          571.9732          589.5889          623.5898
#> Apr 1961       504.5923          563.4729          581.3776          615.9665
#> May 1961       528.0166          589.6014          608.3280          644.5044
#> Jun 1961       606.7212          678.1031          699.8217          741.7884

A brand new thing in the function is the possibility to use several frequencies (double / triple / quadruple / … seasonal models). In order to show how it works, we will generate an artificial time series, inspired by half-hourly electricity demand using sim.gum() function:

ordersGUM <- c(1,1,1)
lagsGUM <- c(1,48,336)
initialGUM1 <- -25381.7
initialGUM2 <- c(23955.09, 24248.75, 24848.54, 25012.63, 24634.14, 24548.22, 24544.63, 24572.77,
                 24498.33, 24250.94, 24545.44, 25005.92, 26164.65, 27038.55, 28262.16, 28619.83,
                 28892.19, 28575.07, 28837.87, 28695.12, 28623.02, 28679.42, 28682.16, 28683.40,
                 28647.97, 28374.42, 28261.56, 28199.69, 28341.69, 28314.12, 28252.46, 28491.20,
                 28647.98, 28761.28, 28560.11, 28059.95, 27719.22, 27530.23, 27315.47, 27028.83,
                 26933.75, 26961.91, 27372.44, 27362.18, 27271.31, 26365.97, 25570.88, 25058.01)
initialGUM3 <- c(23920.16, 23026.43, 22812.23, 23169.52, 23332.56, 23129.27, 22941.20, 22692.40,
                 22607.53, 22427.79, 22227.64, 22580.72, 23871.99, 25758.34, 28092.21, 30220.46,
                 31786.51, 32699.80, 33225.72, 33788.82, 33892.25, 34112.97, 34231.06, 34449.53,
                 34423.61, 34333.93, 34085.28, 33948.46, 33791.81, 33736.17, 33536.61, 33633.48,
                 33798.09, 33918.13, 33871.41, 33403.75, 32706.46, 31929.96, 31400.48, 30798.24,
                 29958.04, 30020.36, 29822.62, 30414.88, 30100.74, 29833.49, 28302.29, 26906.72,
                 26378.64, 25382.11, 25108.30, 25407.07, 25469.06, 25291.89, 25054.11, 24802.21,
                 24681.89, 24366.97, 24134.74, 24304.08, 25253.99, 26950.23, 29080.48, 31076.33,
                 32453.20, 33232.81, 33661.61, 33991.21, 34017.02, 34164.47, 34398.01, 34655.21,
                 34746.83, 34596.60, 34396.54, 34236.31, 34153.32, 34102.62, 33970.92, 34016.13,
                 34237.27, 34430.08, 34379.39, 33944.06, 33154.67, 32418.62, 31781.90, 31208.69,
                 30662.59, 30230.67, 30062.80, 30421.11, 30710.54, 30239.27, 28949.56, 27506.96,
                 26891.75, 25946.24, 25599.88, 25921.47, 26023.51, 25826.29, 25548.72, 25405.78,
                 25210.45, 25046.38, 24759.76, 24957.54, 25815.10, 27568.98, 29765.24, 31728.25,
                 32987.51, 33633.74, 34021.09, 34407.19, 34464.65, 34540.67, 34644.56, 34756.59,
                 34743.81, 34630.05, 34506.39, 34319.61, 34110.96, 33961.19, 33876.04, 33969.95,
                 34220.96, 34444.66, 34474.57, 34018.83, 33307.40, 32718.90, 32115.27, 31663.53,
                 30903.82, 31013.83, 31025.04, 31106.81, 30681.74, 30245.70, 29055.49, 27582.68,
                 26974.67, 25993.83, 25701.93, 25940.87, 26098.63, 25771.85, 25468.41, 25315.74,
                 25131.87, 24913.15, 24641.53, 24807.15, 25760.85, 27386.39, 29570.03, 31634.00,
                 32911.26, 33603.94, 34020.90, 34297.65, 34308.37, 34504.71, 34586.78, 34725.81,
                 34765.47, 34619.92, 34478.54, 34285.00, 34071.90, 33986.48, 33756.85, 33799.37,
                 33987.95, 34047.32, 33924.48, 33580.82, 32905.87, 32293.86, 31670.02, 31092.57,
                 30639.73, 30245.42, 30281.61, 30484.33, 30349.51, 29889.23, 28570.31, 27185.55,
                 26521.85, 25543.84, 25187.82, 25371.59, 25410.07, 25077.67, 24741.93, 24554.62,
                 24427.19, 24127.21, 23887.55, 24028.40, 24981.34, 26652.32, 28808.00, 30847.09,
                 32304.13, 33059.02, 33562.51, 33878.96, 33976.68, 34172.61, 34274.50, 34328.71,
                 34370.12, 34095.69, 33797.46, 33522.96, 33169.94, 32883.32, 32586.24, 32380.84,
                 32425.30, 32532.69, 32444.24, 32132.49, 31582.39, 30926.58, 30347.73, 29518.04,
                 29070.95, 28586.20, 28416.94, 28598.76, 28529.75, 28424.68, 27588.76, 26604.13,
                 26101.63, 25003.82, 24576.66, 24634.66, 24586.21, 24224.92, 23858.42, 23577.32,
                 23272.28, 22772.00, 22215.13, 21987.29, 21948.95, 22310.79, 22853.79, 24226.06,
                 25772.55, 27266.27, 28045.65, 28606.14, 28793.51, 28755.83, 28613.74, 28376.47,
                 27900.76, 27682.75, 27089.10, 26481.80, 26062.94, 25717.46, 25500.27, 25171.05,
                 25223.12, 25634.63, 26306.31, 26822.46, 26787.57, 26571.18, 26405.21, 26148.41,
                 25704.47, 25473.10, 25265.97, 26006.94, 26408.68, 26592.04, 26224.64, 25407.27,
                 25090.35, 23930.21, 23534.13, 23585.75, 23556.93, 23230.25, 22880.24, 22525.52,
                 22236.71, 21715.08, 21051.17, 20689.40, 20099.18, 19939.71, 19722.69, 20421.58,
                 21542.03, 22962.69, 23848.69, 24958.84, 25938.72, 26316.56, 26742.61, 26990.79,
                 27116.94, 27168.78, 26464.41, 25703.23, 25103.56, 24891.27, 24715.27, 24436.51,
                 24327.31, 24473.02, 24893.89, 25304.13, 25591.77, 25653.00, 25897.55, 25859.32,
                 25918.32, 25984.63, 26232.01, 26810.86, 27209.70, 26863.50, 25734.54, 24456.96)
y <- sim.gum(orders=ordersGUM, lags=lagsGUM, nsim=1, frequency=336, obs=3360,
             measurement=rep(1,3), transition=diag(3), persistence=c(0.045,0.162,0.375),
             initial=cbind(initialGUM1,initialGUM2,initialGUM3))$data

We can then apply ADAM to this data:

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE)
testModel
#> Time elapsed: 1.19 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> With backcasting initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19610.07
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0344 0.0000 0.1947 0.2667 
#> Damping parameter: 0.2761
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39232.14 39232.17 39268.23 39268.34 
#> 
#> Forecast errors:
#> ME: 237.839; MAE: 257.99; RMSE: 301.603
#> sCE: 263.377%; Asymmetry: 91.7%; sMAE: 0.85%; sMSE: 0.01%
#> MASE: 0.36; RMSSE: 0.293; rMAE: 0.038; rRMSE: 0.036

Note that the more lags you have, the more initial seasonal components the function will need to estimate, which is a difficult task. This is why we used initial="backcasting" in the example above - this speeds up the estimation by reducing the number of parameters to estimate. Still, the optimiser might not get close to the optimal value, so we can help it. First, we can give more time for the calculation, increasing the number of iterations via maxeval (the default value is 40 iterations for each estimated parameter, e.g. 40 × 5 = 200 in our case):

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, maxeval=10000)
testModel
#> Time elapsed: 1.1 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> With backcasting initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19610.07
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0344 0.0000 0.1947 0.2667 
#> Damping parameter: 0.2761
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39232.14 39232.17 39268.23 39268.34 
#> 
#> Forecast errors:
#> ME: 237.839; MAE: 257.99; RMSE: 301.603
#> sCE: 263.377%; Asymmetry: 91.7%; sMAE: 0.85%; sMSE: 0.01%
#> MASE: 0.36; RMSSE: 0.293; rMAE: 0.038; rRMSE: 0.036

This will take more time, but will typically lead to more refined parameters. You can control other parameters of the optimiser as well, such as algorithm, xtol_rel, print_level and others, which are explained in the documentation for nloptr function from nloptr package (run nloptr.print.options() for details). Second, we can give a different set of initial parameters for the optimiser, have a look at what the function saves:

testModel$B

and use this as a starting point for the reestimation (e.g. with a different algorithm):

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, B=testModel$B)
testModel
#> Time elapsed: 0.54 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> With backcasting initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19610.07
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.0344 0.0000 0.1947 0.2667 
#> Damping parameter: 0.5261
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 39232.14 39232.17 39268.23 39268.34 
#> 
#> Forecast errors:
#> ME: 237.839; MAE: 257.99; RMSE: 301.603
#> sCE: 263.377%; Asymmetry: 91.7%; sMAE: 0.85%; sMSE: 0.01%
#> MASE: 0.36; RMSSE: 0.293; rMAE: 0.038; rRMSE: 0.036

If you are ready to wait, you can change the initialisation to the initial="optimal", which in our case will take much more time because of the number of estimated parameters - 389 for the chosen model. The estimation process in this case might take 20 - 30 times more than in the example above.

In addition, you can specify some parts of the initial state vector or some parts of the persistence vector, here is an example:

testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=TRUE, h=336, holdout=TRUE, persistence=list(beta=0.1))
testModel
#> Time elapsed: 0.94 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> With backcasting initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 56718.36
#> Persistence vector g:
#>  alpha   beta gamma1 gamma2 
#> 0.1351 0.1000 0.2994 0.2959 
#> Damping parameter: 0.97
#> Sample size: 3024
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 3019
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 113446.7 113446.7 113476.8 113476.9 
#> 
#> Forecast errors:
#> ME: -5.17990895999144e+30; MAE: 5.17990895999144e+30; RMSE: 7.86197395954506e+31
#> sCE: -5.73609746029029e+30%; Asymmetry: -100%; sMAE: 1.70717186318163e+28%; sMSE: 6.71387787157461e+56%
#> MASE: 7.22189180203146e+27; RMSSE: 7.63866849811261e+28; rMAE: 7.66282174225378e+26; rRMSE: 9.49308201813626e+27

The function also handles intermittent data (the data with zeroes) and the data with missing values. This is partially covered in the vignette on the oes() function. Here is a simple example:

testModel <- adam(rpois(120,0.5), "MNN", silent=FALSE, h=12, holdout=TRUE,
                  occurrence="odds-ratio")
testModel
#> Time elapsed: 0.03 seconds
#> Model estimated using adam() function: iETS(MNN)[O]
#> With optimal initialisation
#> Occurrence model type: Odds ratio
#> Distribution assumed in the model: Mixture of Bernoulli and Gamma
#> Loss function type: likelihood; Loss function value: 32.8972
#> Persistence vector g:
#> alpha 
#>     0 
#> 
#> Sample size: 108
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 103
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 221.2255 221.4563 234.6362 225.8121 
#> 
#> Forecast errors:
#> Asymmetry: -66.332%; sMSE: 20.13%; rRMSE: 0.645; sPIS: 1686.9%; sCE: -339.88%

Finally, adam() is faster than es() function, because its code is more efficient and it uses a different optimisation algorithm with more finely tuned parameters by default. Let’s compare:

adamModel <- adam(AirPassengers, "CCC",
                  h=12, holdout=TRUE)
esModel <- es(AirPassengers, "CCC",
              h=12, holdout=TRUE)
"adam:"
#> [1] "adam:"
adamModel
#> Time elapsed: 2.42 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#> 
#> Number of models combined: 30
#> Sample size: 132
#> Average number of estimated parameters: 23.1795
#> Average number of degrees of freedom: 108.8205
#> 
#> Forecast errors:
#> ME: -13.996; MAE: 15.633; RMSE: 20.887
#> sCE: -63.984%; sMAE: 5.956%; sMSE: 0.633%
#> MASE: 0.649; RMSSE: 0.667; rMAE: 0.206; rRMSE: 0.203
"es():"
#> [1] "es():"
esModel
#> Time elapsed: 2.32 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#> 
#> Number of models combined: 30
#> Sample size: 132
#> Average number of estimated parameters: 21.7697
#> Average number of degrees of freedom: 110.2303
#> 
#> Forecast errors:
#> ME: -13.056; MAE: 14.7; RMSE: 20.04
#> sCE: -59.688%; sMAE: 5.6%; sMSE: 0.583%
#> MASE: 0.61; RMSSE: 0.64; rMAE: 0.193; rRMSE: 0.195

ADAM ARIMA

As mentioned above, ADAM does not only contain ETS, it also contains ARIMA model, which is regulated via orders parameter. If you want to have a pure ARIMA, you need to switch off ETS, which is done via model="NNN":

testModel <- adam(BJsales, "NNN", silent=FALSE, orders=c(0,2,2),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.06 seconds
#> Model estimated using adam() function: ARIMA(0,2,2)
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 240.5926
#> ARMA parameters of the model:
#>         Lag 1
#> MA(1) -0.7483
#> MA(2) -0.0150
#> 
#> Sample size: 138
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 133
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 491.1852 491.6397 505.8214 506.9413 
#> 
#> Forecast errors:
#> ME: 2.959; MAE: 3.085; RMSE: 3.809
#> sCE: 15.619%; Asymmetry: 90.1%; sMAE: 1.357%; sMSE: 0.028%
#> MASE: 2.589; RMSSE: 2.483; rMAE: 0.995; rRMSE: 0.994

Given that both models are implemented in the same framework, they can be compared using information criteria.

The functionality of ADAM ARIMA is similar to the one of msarima function in smooth package, although there are several differences.

First, changing the distribution parameter will allow switching between additive / multiplicative models. For example, distribution="dlnorm" will create an ARIMA, equivalent to the one on logarithms of the data:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dlnorm",
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.59 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> With optimal initialisation
#> Distribution assumed in the model: Log-Normal
#> Loss function type: likelihood; Loss function value: 463.8379
#> ARMA parameters of the model:
#>        Lag 1  Lag 12
#> AR(1) -0.775 -0.6012
#>         Lag 1  Lag 12
#> MA(1)  0.3353  0.2309
#> MA(2) -0.2053 -0.2530
#> 
#> Sample size: 132
#> Number of estimated parameters: 33
#> Number of degrees of freedom: 99
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  993.6758 1016.5737 1088.8082 1144.7113 
#> 
#> Forecast errors:
#> ME: -17.97; MAE: 18.5; RMSE: 23.208
#> sCE: -82.15%; Asymmetry: -95.1%; sMAE: 7.048%; sMSE: 0.782%
#> MASE: 0.768; RMSSE: 0.741; rMAE: 0.243; rRMSE: 0.225

Second, if you want the model with intercept / drift, you can do it using constant parameter:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12), constant=TRUE,
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.48 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12] with drift
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 489.6563
#> Intercept/Drift value: 0.4986
#> ARMA parameters of the model:
#>         Lag 1  Lag 12
#> AR(1) -0.8542 -0.3225
#>         Lag 1 Lag 12
#> MA(1)  0.5112 0.2626
#> MA(2) -0.3057 0.1392
#> 
#> Sample size: 132
#> Number of estimated parameters: 34
#> Number of degrees of freedom: 98
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1047.312 1071.849 1145.328 1205.230 
#> 
#> Forecast errors:
#> ME: -9.808; MAE: 14.792; RMSE: 19.051
#> sCE: -44.837%; Asymmetry: -65.6%; sMAE: 5.635%; sMSE: 0.527%
#> MASE: 0.614; RMSSE: 0.608; rMAE: 0.195; rRMSE: 0.185

If the model contains non-zero differences, then the constant acts as a drift. Third, you can specify parameters of ARIMA via the arma parameter in the following manner:

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  arma=list(ar=c(0.1,0.1), ma=c(-0.96, 0.03, -0.12, 0.03)),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.29 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 534.8602
#> ARMA parameters of the model:
#>       Lag 1 Lag 12
#> AR(1)   0.1    0.1
#>       Lag 1 Lag 12
#> MA(1) -0.96  -0.12
#> MA(2)  0.03   0.03
#> 
#> Sample size: 132
#> Number of estimated parameters: 27
#> Number of degrees of freedom: 105
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1123.720 1138.259 1201.556 1237.050 
#> 
#> Forecast errors:
#> ME: 9.575; MAE: 17.082; RMSE: 19.148
#> sCE: 43.773%; Asymmetry: 61.9%; sMAE: 6.508%; sMSE: 0.532%
#> MASE: 0.709; RMSSE: 0.611; rMAE: 0.225; rRMSE: 0.186

Finally, the initials for the states can also be provided, although getting the correct ones might be a challenging task (you also need to know how many of them to provide; checking testModel$initial might help):

testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,0)), distribution="dnorm",
                  initial=list(arima=AirPassengers[1:24]),
                  h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.39 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,0)[12]
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 489.0635
#> ARMA parameters of the model:
#>         Lag 1  Lag 12
#> AR(1) -0.4129 -0.1071
#>        Lag 1
#> MA(1) 0.2143
#> MA(2) 0.0565
#> 
#> Sample size: 132
#> Number of estimated parameters: 31
#> Number of degrees of freedom: 101
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1040.127 1059.967 1129.494 1177.931 
#> 
#> Forecast errors:
#> ME: -13.907; MAE: 16.641; RMSE: 21.651
#> sCE: -63.574%; Asymmetry: -81.5%; sMAE: 6.34%; sMSE: 0.68%
#> MASE: 0.691; RMSSE: 0.691; rMAE: 0.219; rRMSE: 0.21

If you work with ADAM ARIMA model, then there is no such thing as “usual” bounds for the parameters, so the function will use the bounds="admissible", checking the AR / MA polynomials in order to make sure that the model is stationary and invertible (aka stable).

Similarly to ETS, you can use different distributions and losses for the estimation. Note that the order selection for ARIMA is done in auto.adam() function, not in the adam()! However, if you do orders=list(..., select=TRUE) in adam(), it will call auto.adam() and do the selection.

Finally, ARIMA is typically slower than ETS, mainly because its initial states are more difficult to estimate due to an increased complexity of the model. If you want to speed things up, use initial="backcasting" and reduce the number of iterations via maxeval parameter.

ADAM ETSX / ARIMAX / ETSX+ARIMA

Another important feature of ADAM is introduction of explanatory variables. Unlike in es(), adam() expects a matrix for data and can work with a formula. If the latter is not provided, then it will use all explanatory variables. Here is a brief example:

BJData <- cbind(BJsales,BJsales.lead)
testModel <- adam(BJData, "AAN", h=18, silent=FALSE)

If you work with data.frame or similar structures, then you can use them directly, ADAM will extract the response variable either assuming that it is in the first column or from the provided formula (if you specify one via formula parameter). Here is an example, where we create a matrix with lags and leads of an explanatory variable:

BJData <- cbind(as.data.frame(BJsales),as.data.frame(xregExpander(BJsales.lead,c(-7:7))))
colnames(BJData)[1] <- "y"
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, formula=y~xLag1+xLag2+xLag3)
testModel
#> Time elapsed: 0.11 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 197.1386
#> Persistence vector g (excluding xreg):
#> alpha 
#>     1 
#> 
#> Sample size: 132
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 126
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 406.2773 406.9493 423.5741 425.2147 
#> 
#> Forecast errors:
#> ME: 1.181; MAE: 1.618; RMSE: 2.247
#> sCE: 9.409%; Asymmetry: 50.7%; sMAE: 0.716%; sMSE: 0.01%
#> MASE: 1.326; RMSSE: 1.438; rMAE: 0.723; rRMSE: 0.895

Similarly to es(), there is a support for variables selection, but via the regressors parameter instead of xregDo, which will then use stepwise() function from greybox package on the residuals of the model:

testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="select")

The same functionality is supported with ARIMA, so you can have, for example, ARIMAX(0,1,1), which is equivalent to ETSX(A,N,N):

testModel <- adam(BJData, "NNN", h=18, silent=FALSE, holdout=TRUE, regressors="select", orders=c(0,1,1))

The two models might differ because they have different initialisation in the optimiser and different bounds for parameters (ARIMA relies on invertibility condition, while ETS does the usual (0,1) bounds by default). It is possible to make them identical if the number of iterations is increased and the initial parameters are the same. Here is an example of what happens, when the two models have exactly the same parameters:

BJData <- BJData[,c("y",names(testModel$initial$xreg))];
testModel <- adam(BJData, "NNN", h=18, silent=TRUE, holdout=TRUE, orders=c(0,1,1),
                  initial=testModel$initial, arma=testModel$arma)
testModel
#> Time elapsed: 0 seconds
#> Model estimated using adam() function: ARIMAX(0,1,1)
#> With provided initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 513.2029
#> ARMA parameters of the model:
#>        Lag 1
#> MA(1) 0.2402
#> 
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1028.406 1028.437 1031.289 1031.364 
#> 
#> Forecast errors:
#> ME: 0.636; MAE: 0.636; RMSE: 0.872
#> sCE: 5.063%; Asymmetry: 100%; sMAE: 0.281%; sMSE: 0.001%
#> MASE: 0.521; RMSSE: 0.558; rMAE: 0.284; rRMSE: 0.347
names(testModel$initial)[1] <- names(testModel$initial)[[1]] <- "level"
testModel2 <- adam(BJData, "ANN", h=18, silent=TRUE, holdout=TRUE,
                   initial=testModel$initial, persistence=testModel$arma$ma+1)
testModel2
#> Time elapsed: 0 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> With provided initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 1e+300
#> Persistence vector g (excluding xreg):
#>  alpha 
#> 1.2402 
#> 
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 1028.406 1028.437 1031.289 1031.364 
#> 
#> Forecast errors:
#> ME: 0.636; MAE: 0.636; RMSE: 0.872
#> sCE: 5.063%; Asymmetry: 100%; sMAE: 0.281%; sMSE: 0.001%
#> MASE: 0.521; RMSSE: 0.558; rMAE: 0.284; rRMSE: 0.347

Another feature of ADAM is the time varying parameters in the SSOE framework, which can be switched on via regressors="adapt":

testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="adapt")
testModel$persistence
#>        alpha       delta1       delta2       delta3       delta4       delta5 
#> 0.0004758253 0.3163533479 0.3151916749 0.3287991713 0.0026271472 0.3901947312

Note that the default number of iterations might not be sufficient in order to get close to the optimum of the function, so setting maxeval to something bigger might help. If you want to explore, why the optimisation stopped, you can provide print_level=41 parameter to the function, and it will print out the report from the optimiser. In the end, the default parameters are tuned in order to give a reasonable solution, but given the complexity of the model, they might not guarantee to give the best one all the time.

Finally, you can produce a mixture of ETS, ARIMA and regression, by using the respective parameters, like this:

testModel <- adam(BJData, "AAN", h=18, silent=FALSE, holdout=TRUE, orders=c(1,0,0))
summary(testModel)
#> 
#> Model estimated using adam() function: ETSX(AAN)+ARIMA(1,0,0)
#> Response variable: y
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 48.844
#> Coefficients:
#>             Estimate Std. Error Lower 2.5% Upper 97.5%  
#> alpha         0.9979     0.1074     0.7852      1.0000 *
#> beta          0.2843     0.0800     0.1260      0.4424 *
#> phi1[1]      -0.1329     0.0290    -0.1902     -0.0756 *
#> level        85.4714     5.2323    75.1118     95.8134 *
#> trend        -0.0333     0.2609    -0.5498      0.4823  
#> ARIMAState1   2.5778     6.6721   -10.6325     15.7658  
#> xLag3         4.6433     0.1083     4.4289      4.8574 *
#> xLag7         0.4525     0.1194     0.2161      0.6884 *
#> xLag4         3.2369     0.1346     2.9705      3.5029 *
#> xLag6         1.1217     0.1439     0.8367      1.4062 *
#> xLag5         1.9471     0.1522     1.6458      2.2480 *
#> 
#> Error standard deviation: 0.3674
#> Sample size: 132
#> Number of estimated parameters: 12
#> Number of degrees of freedom: 120
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 121.6880 124.3098 156.2816 162.6826

This might be handy, when you explore a high frequency data, want to add calendar events, apply ETS and add AR/MA errors to it.

Finally, if you estimate ETSX or ARIMAX model and want to speed things up, it is recommended to use initial="backcasting", which will then initialise dynamic part of the model via backcasting and use optimisation for the parameters of the explanatory variables:

testModel <- adam(BJData, "AAN", h=18, silent=TRUE, holdout=TRUE, initial="backcasting")
summary(testModel)
#> 
#> Model estimated using adam() function: ETSX(AAN)
#> Response variable: y
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 46.8982
#> Coefficients:
#>       Estimate Std. Error Lower 2.5% Upper 97.5%  
#> alpha   0.7692     0.0940     0.5831      0.9551 *
#> beta    0.4800     0.2973     0.0000      0.7692  
#> xLag3   4.5775     2.6406    -0.6491      9.7981  
#> xLag7   0.4197     2.6482    -4.8219      5.6555  
#> xLag4   3.1574     2.3485    -1.4909      7.8005  
#> xLag6   1.0822     2.3468    -3.5627      5.7220  
#> xLag5   1.8379     2.2317    -2.5793      6.2501  
#> 
#> Error standard deviation: 0.3562
#> Sample size: 132
#> Number of estimated parameters: 8
#> Number of degrees of freedom: 124
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 109.7964 110.9671 132.8588 135.7170

Auto ADAM

While the original adam() function allows selecting ETS components and explanatory variables, it does not allow selecting the most suitable distribution and / or ARIMA components. This is what auto.adam() function is for.

In order to do the selection of the most appropriate distribution, you need to provide a vector of those that you want to check:

testModel <- auto.adam(BJsales, "XXX", silent=FALSE,
                       distribution=c("dnorm","dlaplace","ds"),
                       h=12, holdout=TRUE)
#> Evaluating models with different distributions... dnorm ,  Selecting ARIMA orders... 
#> Selecting differences... 
#> Selecting ARMA... |
#> The best ARIMA is selected. dlaplace ,  Selecting ARIMA orders... 
#> Selecting differences... 
#> Selecting ARMA... |
#> The best ARIMA is selected. ds ,  Selecting ARIMA orders... 
#> Selecting differences... 
#> Selecting ARMA... |-
#> The best ARIMA is selected. Done!
testModel
#> Time elapsed: 0.55 seconds
#> Model estimated using auto.adam() function: ETS(AAdN) with drift
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 236.8501
#> Intercept/Drift value: 0.6397
#> Persistence vector g:
#>  alpha   beta 
#> 0.9550 0.2839 
#> Damping parameter: 0.8551
#> Sample size: 138
#> Number of estimated parameters: 7
#> Number of degrees of freedom: 131
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 487.7002 488.5617 508.1910 510.3135 
#> 
#> Forecast errors:
#> ME: 0.297; MAE: 1.051; RMSE: 1.323
#> sCE: 1.57%; Asymmetry: 8.8%; sMAE: 0.462%; sMSE: 0.003%
#> MASE: 0.882; RMSSE: 0.862; rMAE: 0.339; rRMSE: 0.345

This process can also be done in parallel on either the automatically selected number of cores (e.g. parallel=TRUE) or on the specified by user (e.g. parallel=4):

testModel <- auto.adam(BJsales, "ZZZ", silent=FALSE, parallel=TRUE,
                       h=12, holdout=TRUE)

If you want to add ARIMA or regression components, you can do it in the exactly the same way as for the adam() function. Here is an example of ETS+ARIMA:

testModel <- auto.adam(BJsales, "AAN", orders=list(ar=2,i=0,ma=0), silent=TRUE,
                       distribution=c("dnorm","dlaplace","ds","dgnorm"),
                       h=12, holdout=TRUE)
testModel
#> Time elapsed: 0.42 seconds
#> Model estimated using auto.adam() function: ETS(AAN)+ARIMA(2,0,0)
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 240.5239
#> Persistence vector g:
#>  alpha   beta 
#> 0.2789 0.2134 
#> 
#> ARMA parameters of the model:
#>        Lag 1
#> AR(1) 0.7589
#> AR(2) 0.2321
#> 
#> Sample size: 138
#> Number of estimated parameters: 9
#> Number of degrees of freedom: 129
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 499.0478 500.4540 525.3930 528.8575 
#> 
#> Forecast errors:
#> ME: 2.999; MAE: 3.119; RMSE: 3.858
#> sCE: 15.832%; Asymmetry: 90.6%; sMAE: 1.372%; sMSE: 0.029%
#> MASE: 2.618; RMSSE: 2.515; rMAE: 1.006; rRMSE: 1.007

However, this way the function will just use ARIMA(2,0,0) and fit it together with ETS(A,A,N). If you want it to select the most appropriate ARIMA orders from the provided (e.g. up to AR(2), I(1) and MA(2)), you need to add parameter select=TRUE to the list in orders:

testModel <- auto.adam(BJsales, "XXN", orders=list(ar=2,i=2,ma=2,select=TRUE),
                       distribution="default", silent=FALSE,
                       h=12, holdout=TRUE)
#> Evaluating models with different distributions... default ,  Selecting ARIMA orders... 
#> Selecting differences... 
#> Selecting ARMA... |
#> The best ARIMA is selected. Done!
testModel
#> Time elapsed: 0.16 seconds
#> Model estimated using auto.adam() function: ETS(AAdN) with drift
#> With optimal initialisation
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 236.8501
#> Intercept/Drift value: 0.6397
#> Persistence vector g:
#>  alpha   beta 
#> 0.9550 0.2839 
#> Damping parameter: 0.8551
#> Sample size: 138
#> Number of estimated parameters: 7
#> Number of degrees of freedom: 131
#> Information criteria:
#>      AIC     AICc      BIC     BICc 
#> 487.7002 488.5617 508.1910 510.3135 
#> 
#> Forecast errors:
#> ME: 0.297; MAE: 1.051; RMSE: 1.323
#> sCE: 1.57%; Asymmetry: 8.8%; sMAE: 0.462%; sMSE: 0.003%
#> MASE: 0.882; RMSSE: 0.862; rMAE: 0.339; rRMSE: 0.345

Knowing how to work with adam(), you can use similar principles, when dealing with auto.adam(). Just keep in mind that the provided persistence, phi, initial, arma and B won’t work, because this contradicts the idea of the model selection.

Finally, there is also the mechanism of automatic outliers detection, which extracts residuals from the best model, flags observations that lie outside the prediction interval of the width level in sample and then refits auto.adam() with the dummy variables for the outliers. Here how it works:

testModel <- auto.adam(AirPassengers, "PPP", silent=FALSE, outliers="use",
                       distribution="default",
                       h=12, holdout=TRUE)
#> Evaluating models with different distributions... default ,  Selecting ARIMA orders... 
#> Selecting differences... 
#> Selecting ARMA... |-
#> The best ARIMA is selected. 
#> Dealing with outliers...
testModel
#> Time elapsed: 5.78 seconds
#> Model estimated using auto.adam() function: ETSX(MMdM)
#> With optimal initialisation
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 462.9664
#> Persistence vector g (excluding xreg):
#>  alpha   beta  gamma 
#> 0.4110 0.0002 0.5889 
#> Damping parameter: 0.997
#> Sample size: 132
#> Number of estimated parameters: 21
#> Number of degrees of freedom: 111
#> Information criteria:
#>       AIC      AICc       BIC      BICc 
#>  967.9327  976.3327 1028.4716 1048.9793 
#> 
#> Forecast errors:
#> ME: -18.157; MAE: 18.545; RMSE: 23.881
#> sCE: -83.005%; Asymmetry: -95.7%; sMAE: 7.065%; sMSE: 0.828%
#> MASE: 0.77; RMSSE: 0.762; rMAE: 0.244; rRMSE: 0.232

If you specify outliers="select", the function will create leads and lags 1 of the outliers and then select the most appropriate ones via the regressors parameter of adam.

If you want to know more about ADAM, you are welcome to visit the online textbook (this is a work in progress at the moment).

Hyndman, Rob J, Anne B Koehler, J Keith Ord, and Ralph D Snyder. 2008. Forecasting with Exponential Smoothing. Springer Berlin Heidelberg.