Page 1 of 1

series with a masked

Posted: Sat May 24, 2025 8:37 am
by rochona
Deeper Dive: Moirai
Illustrated in Figure 2, Moirai follows a (non-overlapping) patch-based approach to modeling time encoder architecture. One of our proposed modifications to extend the architecture to the any-variate setting is to “flatten” multivariate time series, considering all variates as a single sequence. Patches are subsequently projected into vector representations via a multi-patch size input projection layer. The [mask] signifies a learnable embedding that replaces patches falling within the forecast horizon. The output tokens are then decoded via the multi-patch size output projection into the parameters of the mixture distribution. While not visualized, (non-learnable) instance normalization is applied to inputs/outputs, aligning with the current standard practice for deep forecasting models.

In our pre-training task, we formulate the objective to optimize the mixture distribution log-likelihood. The design of both the data distribution and task distribution are two critical aspects afghanistan phone number list of the pre-training pipeline. This design imparts versatile capabilities to our Large Time Series Model (LTM), enabling it to adapt to a range of downstream tasks. This flexibility stands in contrast to the prevailing deep forecasting paradigm, where models are typically specialized for specific datasets and settings.

Results
We train Moirai in 3 sizes – small/base/large with 14m/91m/311m parameters! On in-distribution evaluations using the Monash Time Series Forecasting Benchmark, Moirai displays phenomenal performance, beating all baselines.