Time series manipulation in F#
In this section, we look at F# data frame library features that are useful when working
with time series data or, more generally, any ordered series. Although we mainly look at
operations on the Series
type, many of the operations can be applied to data frame Frame
containing multiple series. Furthermore, data frame provides an elegant way for aligning and
joining series.
You can also get this page as an F# script file from GitHub and run the samples interactively.
Generating input data
For the purpose of this tutorial, we'll need some input data. For simplicitly, we use the following function which generates random prices using the geometric Brownian motion. The code is adapted from the financial tutorial on Try F#.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: |
// Use Math.NET for probability distributions #r "MathNet.Numerics.dll" open MathNet.Numerics.Distributions /// Generates price using geometric Brownian motion /// - 'seed' specifies the seed for random number generator /// - 'drift' and 'volatility' set properties of the price movement /// - 'initial' and 'start' specify the initial price and date /// - 'span' specifies time span between individual observations /// - 'count' is the number of required values to generate let randomPrice seed drift volatility initial start span count = (Implementation omitted) // 12:00 AM today, in current time zone let today = DateTimeOffset(DateTime.Today) let stock1 = randomPrice 1 0.1 3.0 20.0 today let stock2 = randomPrice 2 0.2 1.5 22.0 today |
The implementation of the function is not particularly important for the purpose of this
page, but you can find it in the script file with full source.
Once we have the function, we define a date today
(representing today's midnight) and
two helper functions that set basic properties for the randomPrice
function.
To get random prices, we now only need to call stock1
or stock2
with TimeSpan
and
the required number of prices:
1: 2: 3: |
Chart.Combine [ stock1 (TimeSpan(0, 1, 0)) 1000 |> Chart.FastLine stock2 (TimeSpan(0, 1, 0)) 1000 |> Chart.FastLine ] |
The above snippet generates 1k of prices in one minute intervals and plots them using the F# Charting library. When you run the code and tweak the chart look, you should see something like this:
Data alignment and zipping
One of the key features of the data frame library for working with time series data is
automatic alignment based on the keys. When we have multiple time series with date
as the key (here, we use DateTimeOffset
, but any type of date will do), we can combine
multiple series and align them automatically to specified date keys.
To demonstrate this feature, we generate random prices in 60 minute, 30 minute and 65 minute intervals:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: |
let s1 = series <| stock1 (TimeSpan(1, 0, 0)) 6 val s1 : Series<DateTimeOffset,float> = series [ 12:00:00 AM => 20.76; 1:00:00 AM => 21.11; 2:00:00 AM => 22.51 3:00:00 AM => 23.88; 4:00:00 AM => 23.23; 5:00:00 AM => 22.68 ] let s2 = series <| stock2 (TimeSpan(0, 30, 0)) 12 val s2 : Series<DateTimeOffset,float> = series [ 12:00:00 AM => 21.61; 12:30:00 AM => 21.64; 1:00:00 AM => 21.86 1:30:00 AM => 22.22; 2:00:00 AM => 22.35; 2:30:00 AM => 22.76 3:00:00 AM => 22.68; 3:30:00 AM => 22.64; 4:00:00 AM => 22.90 4:30:00 AM => 23.40; 5:00:00 AM => 23.33; 5:30:00 AM => 23.43] let s3 = series <| stock1 (TimeSpan(1, 5, 0)) 6 val s3 : Series<DateTimeOffset,float> = series [ 12:00:00 AM => 21.37; 1:05:00 AM => 22.73; 2:10:00 AM => 22.08 3:15:00 AM => 23.92; 4:20:00 AM => 22.72; 5:25:00 AM => 22.79 |
Zipping time series
Let's first look at operations that are available on the Series<K, V>
type. A series
exposes Zip
operation that can combine multiple series into a single series of pairs.
This is not as convenient as working with data frames (which we'll see later), but it
is useful if you only need to work with one or two columns without missing values:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: |
// Match values from right series to keys of the left one // (this creates series with no missing values) s1.Zip(s2, JoinKind.Left) val it : Series<DateTimeOffset,float opt * float opt> 12:00:00 AM -> (21.32, 21.61) 1:00:00 AM -> (22.62, 21.86) 2:00:00 AM -> (22.00, 22.35) (...) // Match values from the left series to keys of the right one // (right has higher resolution, so half of left values are missing) s1.Zip(s2, JoinKind.Right) val it : Series<DateTimeOffset,float opt * float opt> 12:00:00 AM -> (21.32, 21.61) 12:30:00 AM -> (<missing>, 21.64) 1:00:00 AM -> (22.62, 21.86) (...) // Use left series key and find the nearest previous // (smaller) value from the right series s1.Zip(s2, JoinKind.Left, Lookup.NearestSmaller) val it : Series<DateTimeOffset,float opt * float opt> 12:00:00 AM -04:00 -> (21.32, 21.61) 1:00:00 AM -04:00 -> (22.62, 21.86) 2:00:00 AM -04:00 -> (22.00, 22.35) (...) |
Using Zip
on series is somewhat complicated. The result is a series of tuples, but each
component of the tuple may be missing. To represent this, the library uses the T opt
type
(a type alias for OptionalValue<T>
). This is not necessary when we use data frame to
work with multiple columns.
Joining data frames
When we store data in data frames, we do not need to use tuples to represent combined values. Instead, we can simply use data frame with multiple columns. To see how this works, let's first create three data frames containing the three series from the previous section:
1: 2: 3: 4: 5: 6: |
// Contains value for each hour let f1 = Frame.ofColumns ["S1" => s1] // Contains value every 30 minutes let f2 = Frame.ofColumns ["S2" => s2] // Contains values with 65 minute offsets let f3 = Frame.ofColumns ["S3" => s3] |
Similarly to Series<K, V>
, the type Frame<R, C>
has an instance method Join
that can be
used for joining (for unordered) or aligning (for ordered) data. The same operation is also
exposed as Frame.join
and Frame.joinAlign
functions, but it is usually more convenient to use
the member syntax in this case:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: |
// Union keys from both frames and align corresponding values f1.Join(f2, JoinKind.Outer) val it : Frame<DateTimeOffset,string> = S1 S2 12:00:00 AM -> 21.32 21.61 12:30:00 AM -> <missing> 21.64 1:00:00 AM -> 22.62 21.86 (...) // Take only keys where both frames contain all values // (We get only a single row, because 'f3' is off by 5 minutes) f2.Join(f3, JoinKind.Inner) val it : Frame<DateTimeOffset,string> = S2 S3 12:00:00 AM -> 21.61 21.37 // Take keys from the left frame and find corresponding values // from the right frame, or value for a nearest smaller date // ($21.37 is repeated for all values between 12:00 and 1:05) f2.Join(f3, JoinKind.Left, Lookup.NearestSmaller) val it : Frame<DateTimeOffset,string> = S2 S3 12:00:00 AM -> 21.61 21.37 12:30:00 AM -> 21.64 21.37 1:00:00 AM -> 21.86 21.37 1:30:00 AM -> 22.22 22.73 (...) // If we perform left join as previously, but specify exact // matching, then most of the values are missing f2.Join(f3, JoinKind.Left, Lookup.Exact) val it : Frame<DateTimeOffset,string> = S2 S3 12:00:00 AM -> 21.61 21.37 12:30:00 AM -> 21.64 <missing> 1:00:00 AM -> 21.86 <missing> (...) // Equivalent to line 2, using function syntax Frame.join JoinKind.Outer f1 f2 // Equivalent to line 20, using function syntax Frame.joinAlign JoinKind.Left Lookup.NearestSmaller f1 f2 |
The automatic alignment is extremely useful when you have multiple data series with different
offsets between individual observations. You can choose your set of keys (dates) and then easily
align other data to match the keys. Another alternative to using Join
explicitly is to create
a new frame with just keys that you are interested in (using Frame.ofRowKeys
) and then use
the AddSeries
member (or the df?New <- s
syntax) to add series. This will automatically left
join the new series to match the current row keys.
When aligning data, you may or may not want to create data frame with missing values. If your
observations do not happen at exact time, then using Lookup.NearestSmaller
or Lookup.NearestGreater
is a great way to avoid mismatch.
If you have observations that happen e.g. at two times faster rate (one series is hourly and
another is half-hourly), then you can create data frame with missing values using Lookup.Exact
(the default value) and then handle missing values explicitly (as discussed here).
Windowing, chunking and pairwise
Windowing and chunking are two operations on ordered series that allow aggregating the values of series into groups. Both of these operations work on consecutive elements, which contrast with grouping that does not use order.
Sliding windows
Sliding window creates windows of certain size (or certain condition). The window "slides" over the input series and provides a view on a part of the series. The key thing is that a single element will typically appear in multiple windows.
1: 2: 3: 4: 5: 6: 7: 8: 9: |
// Create input series with 6 observations let lf = series <| stock1 (TimeSpan(0, 1, 0)) 6 // Create series of series representing individual windows lf |> Series.window 4 // Aggregate each window using 'Series.mean' lf |> Series.windowInto 4 Series.mean // Get first value in each window lf |> Series.windowInto 4 Series.firstValue |
The functions used above create window of size 4 that moves from the left to right.
Given input [1,2,3,4,5,6]
the this produces the following three windows:
[1,2,3,4]
, [2,3,4,5]
and [3,4,5,6]
. By default, the Series.window
function
automatically chooses the key of the last element of the window as the key for
the whole window (we'll see how to change this soon):
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
// Calculate means for sliding windows let lfm1 = lf |> Series.windowInto 4 Series.mean // Construct dataframe to show aligned results Frame.ofColumns [ "Orig" => lf; "Means" => lfm1 ] val it : Frame<DateTimeOffset,string> = Means Orig 12:00:00 AM -> <missing> 20.16 12:01:00 AM -> <missing> 20.32 12:02:00 AM -> <missing> 20.25 12:03:00 AM -> 20.30 20.45 12:04:00 AM -> 20.34 20.32 12:05:00 AM -> 20.34 20.33 |
What if we want to avoid creating <missing>
values? One approach is to
specify that we want to generate windows of smaller sizes at the beginning
or at the end of the beginning. This way, we get incomplete windows that look as
[1]
, [1,2]
, [1,2,3]
followed by the three complete windows shown above:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
let lfm2 = // Create sliding windows with incomplete windows at the beginning lf |> Series.windowSizeInto (4, Boundary.AtBeginning) (fun ds -> Series.mean ds.Data) Frame.ofColumns [ "Orig" => lf; "Means" => lfm2 ] val it : Frame<DateTimeOffset,string> = Means Orig 12:00:00 AM -> 20.16 20.16 12:01:00 AM -> 20.24 20.32 12:02:00 AM -> 20.24 20.25 12:03:00 AM -> 20.30 20.45 12:04:00 AM -> 20.34 20.32 12:05:00 AM -> 20.34 20.33 |
As you can see, the values in the first column are equal, because the first
Mean
value is just the average of singleton series.
When you specify Boundary.AtBeginning
(this example) or Boundary.Skip
(default value used in the previous example), the function uses the last key
of the window as the key of the aggregated value. When you specify
Boundary.AtEnding
, the last key is used, so the values can be nicely
aligned with original values. When you want to specify custom key selector,
you can use a more general function Series.aggregate
.
In the previous sample, the code that performs aggregation is no longer
just a simple function like Series.mean
, but a lambda that takes ds
,
which is of type DataSegment<T>
. This type informs us whether the window
is complete or not. For example:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
// Simple series with characters let st = Series.ofValues [ 'a' .. 'e' ] st |> Series.windowSizeInto (3, Boundary.AtEnding) (function | DataSegment.Complete(ser) -> // Return complete windows as uppercase strings String(ser |> Series.values |> Array.ofSeq).ToUpper() | DataSegment.Incomplete(ser) -> // Return incomplete windows as padded lowercase strings String(ser |> Series.values |> Array.ofSeq).PadRight(3, '-') ) val it : Series<int,string> = 0 -> ABC 1 -> BCD 2 -> CDE 3 -> de- 4 -> e-- |
Window size conditions
The previous examples generated windows of fixed size. However, there are two other options for specifying when a window ends.
- The first option is to specify the maximal distance between the first and the last key
- The second option is to specify a function that is called with the first and the last key; a window ends when the function returns false.
The two functions are Series.windowDist
and Series.windowWhile
(together
with versions suffixed with Into
that call a provided function to aggregate
each window):
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
// Generate prices for each hour over 30 days let hourly = series <| stock1 (TimeSpan(1, 0, 0)) (30*24) // Generate windows of size 1 day (if the source was // irregular, windows would have varying size) hourly |> Series.windowDist (TimeSpan(24, 0, 0)) // Generate windows such that date in each window is the same // (windows start every hour and end at the end of the day) hourly |> Series.windowWhile (fun d1 d2 -> d1.Date = d2.Date) |
Chunking series
Chunking is similar to windowing, but it creates non-overlapping chunks, rather than (overlapping) sliding windows. The size of chunk can be specified in the same three ways as for sliding windows (fixed size, distance on keys and condition):
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: |
// Generate per-second observations over 10 minutes let hf = series <| stock1 (TimeSpan(0, 0, 1)) 600 // Create 10 second chunks with (possible) incomplete // chunk of smaller size at the end. hf |> Series.chunkSize (10, Boundary.AtEnding) // Create 10 second chunks using time span and get // the first observation for each chunk (downsample) hf |> Series.chunkDistInto (TimeSpan(0, 0, 10)) Series.firstValue // Create chunks where hh:mm component is the same // (containing observations for all seconds in the minute) hf |> Series.chunkWhile (fun k1 k2 -> (k1.Hour, k1.Minute) = (k2.Hour, k2.Minute)) |
The above examples use various chunking functions in a very similar way, mainly because the randomly generated input is very uniform. However, they all behave differently for inputs with non-uniform keys.
Using chunkSize
means that the chunks have the same size, but may correspond
to time series of different time spans. Using chunkDist
guarantees that there
is a maximal time span over each chunk, but it does not guarantee when a chunk
starts. That is something which can be achieved using chunkWhile
.
Finally, all of the aggregations discussed so far are just special cases of
Series.aggregate
which takes a discriminated union that specifies the kind
of aggregation (see API reference).
However, in practice it is more convenient to use the helpers presented here -
in some rare cases, you might need to use Series.aggregate
as it provides
a few other options.
Pairwise
A special form of windowing is building a series of pairs containing a current and previous value from the input series (in other words, the key for each pair is the key of the later element). For example:
1: 2: 3: 4: 5: |
// Create a series of pairs from earlier 'hf' input hf |> Series.pairwise // Calculate differences between the current and previous values hf |> Series.pairwiseWith (fun k (v1, v2) -> v2 - v1) |
The pairwise
operation always returns a series that has no value for
the first key in the input series. If you want more complex behavior, you
will usually need to replace pairwise
with window
. For example, you might
want to get a series that contains the first value as the first element,
followed by differences. This has the nice property that summing rows,
starting from the first one gives you the current price:
1: 2: 3: 4: 5: 6: |
// Sliding window with incomplete segment at the beginning hf |> Series.windowSizeInto (2, Boundary.AtBeginning) (function // Return the first value for the first segment | DataSegment.Incomplete s -> s.GetAt(0) // Calculate difference for all later segments | DataSegment.Complete s -> s.GetAt(1) - s.GetAt(0)) |
Sampling and resampling time series
Given a time series with high-frequency prices, sampling or resampling makes it possible to get time series with representative values at lower frequency. The library uses the following terminology:
Lookup means that we find values at specified key; if a key is not available, we can look for value associated with the nearest smaller or the nearest greater key.
Resampling means that we aggregate values values into chunks based on a specified collection of keys (e.g. explicitly provided times), or based on some relation between keys (e.g. date times having the same date).
Uniform resampling is similar to resampling, but we specify keys by providing functions that generate a uniform sequence of keys (e.g. days), the operation also fills value for days that have no corresponding observations in the input sequence.
Finally, the library also provides a few helper functions that are specifically
desinged for series with keys of types DateTime
and DateTimeOffset
.
Lookup
Given a series hf
, you can get a value at a specified key using hf.Get(key)
or using hf |> Series.get key
. However, it is also possible to find values
for larger number of keys at once. The instance member for doing this
is hf.GetItems(..)
. Moreover, both Get
and GetItems
take an optional
parameter that specifies the behavior when the exact key is not found.
Using the function syntax, you can use Series.getAll
for exact key
lookup and Series.lookupAll
when you want more flexible lookup:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: |
// Generate a bit less than 24 hours of data with 13.7sec offsets let mf = series <| stock1 (TimeSpan.FromSeconds(13.7)) 6300 // Generate keys for all minutes in 24 hours let keys = [ for m in 0.0 .. 24.0*60.0-1.0 -> today.AddMinutes(m) ] // Find value for a given key, or nearest greater key with value mf |> Series.lookupAll keys Lookup.NearestGreater val it : Series<DateTimeOffset,float> = 12:00:00 AM -> 20.07 12:01:00 AM -> 19.98 ... -> ... 11:58:00 PM -> 19.03 11:59:00 PM -> <missing> // Find value for nearest smaller key // (This returns value for 11:59:00 PM as well) mf |> Series.lookupAll keys Lookup.NearestSmaller // Find values for exact key // (This only works for the first key) mf |> Series.lookupAll keys Lookup.Exact |
Lookup operations only return one value for each key, so they are useful for quick sampling of large (or high-frequency) data. When we want to calculate a new value based on multiple values, we need to use resampling.
Resampling
Series supports two kinds of resamplings. The first kind is similar to lookup in that we have to explicitly specify keys. The difference is that resampling does not find just the nearest key, but all smaller or greater keys. For example:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: |
// For each key, collect values for greater keys until the // next one (chunk for 11:59:00 PM is empty) mf |> Series.resample keys Direction.Forward // For each key, collect values for smaller keys until the // previous one (the first chunk will be singleton series) mf |> Series.resample keys Direction.Backward // Aggregate each chunk of preceding values using mean mf |> Series.resampleInto keys Direction.Backward (fun k s -> Series.mean s) // Resampling is also available via the member syntax mf.Resample(keys, Direction.Forward) |
The second kind of resampling is based on a projection from existing keys in
the series. The operation then collects chunks such that the projection returns
equal keys. This is very similar to Series.groupBy
, but resampling assumes
that the projection preserves the ordering of the keys, and so it only aggregates
consequent keys.
The typical scenario is when you have time series with date time information
(here DateTimeOffset
) and want to get information for each day (we use
DateTime
with empty time to represent dates):
1: 2: 3: 4: 5: 6: 7: 8: |
// Generate 2.5 months of data in 1.7 hour offsets let ds = series <| stock1 (TimeSpan.FromHours(1.7)) 1000 // Sample by day (of type 'DateTime') ds |> Series.resampleEquiv (fun d -> d.Date) // Sample by day (of type 'DateTime') ds.ResampleEquivalence(fun d -> d.Date) |
The same operation can be easily implemented using Series.chunkWhile
, but as
it is often used in the context of sampling, it is included in the library as a
primitive. Moreover, we'll see that it is closely related to uniform resampling.
Note that the resulting series has different type of keys than the source. The
source has keys DateTimeOffset
(representing date with time) while the resulting
keys are of the type returned by the projection (here, DateTime
representing just
dates).
Uniform resampling
In the previous section, we looked at resampleEquiv
, which is useful if you want
to sample time series by keys with "lower resolution" - for example, sample date time
observations by date. However, the function discussed in the previous section only
generates values for which there are keys in the input sequence - if there is no
observation for an entire day, then the day will not be included in the result.
If you want to create sampling that assigns value to each key in the range specified by the input sequence, then you can use uniform resampling.
The idea is that uniform resampling applies the key projection to the smallest and greatest key of the input (e.g. gets date of the first and last observation) and then it generates all keys in the projected space (e.g. all dates). Then it picks the best value for each of the generated key.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: |
// Create input data with non-uniformly distributed keys // (1 value for 10/3, three for 10/4 and two for 10/6) let days = [ "10/3/2013 12:00:00"; "10/4/2013 15:00:00" "10/4/2013 18:00:00"; "10/4/2013 19:00:00" "10/6/2013 15:00:00"; "10/6/2013 21:00:00" ] let nu = stock1 (TimeSpan(24,0,0)) 10 |> series |> Series.indexWith days |> Series.mapKeys DateTimeOffset.Parse // Generate uniform resampling based on dates. Fill // missing chunks with nearest smaller observations. let sampled = nu |> Series.resampleUniform Lookup.NearestSmaller (fun dt -> dt.Date) (fun dt -> dt.AddDays(1.0)) // Same thing using the C#-friendly member syntax // (Lookup.NearestSmaller is the default value) nu.ResampleUniform((fun dt -> dt.Date), (fun dt -> dt.AddDays(1.0))) // Turn into frame with multiple columns for each day // (to format the result in a readable way) sampled |> Series.mapValues Series.indexOrdinally |> Frame.ofRows val it : Frame<DateTime,int> = 0 1 2 10/3/2013 -> 21.45 <missing> <missing> 10/4/2013 -> 21.63 19.83 17.51 10/5/2013 -> 17.51 <missing> <missing> 10/6/2013 -> 18.80 20.93 <missing> |
To perform the uniform resampling, we need to specify how to project (resampled) keys
from original keys (we return the Date
), how to calculate the next key (add 1 day)
and how to fill missing values.
After performing the resampling, we turn the data into a data frame, so that we can
nicely see the results. The individual chunks have the actual observation times as keys,
so we replace those with just integers (using Series.indexOrdinal
). The result contains
a simple ordered row of observations for each day.
The important thing is that there is an observation for each day - even for for 10/5/2013
which does not have any corresponding observations in the input. We call the resampling
function with Lookup.NearestSmaller
, so the value 17.51 is picked from the last observation
of the previous day (Lookup.NearestGreater
would pick 18.80 and Lookup.Exact
would give
us an empty series for that date).
Sampling time series
Perhaps the most common sampling operation that you might want to do is to sample time series
by a specified TimeSpan
. Although this can be easily done by using some of the functions above,
the library provides helper functions exactly for this purpose:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
// Generate 1k observations with 1.7 hour offsets let pr = series <| stock1 (TimeSpan.FromHours(1.7)) 1000 // Sample at 2 hour intervals; 'Backward' specifies that // we collect all previous values into a chunk. pr |> Series.sampleTime (TimeSpan(2, 0, 0)) Direction.Backward // Same thing using member syntax - 'Backward' is the dafult pr.Sample(TimeSpan(2, 0, 0)) // Get the most recent value, sampled at 2 hour intervals pr |> Series.sampleTimeInto (TimeSpan(2, 0, 0)) Direction.Backward Series.lastValue |
Calculations and statistics
In the final section of this tutorial, we look at writing some calculations over time series. Many of the functions demonstrated here can be also used on unordered data frames and series.
Shifting and differences
First of all, let's look at functions that we need when we need to compare subsequent values in
the series. We already demonstrated how to do this using Series.pairwise
. In many cases,
the same thing can be done using an operation that operates over the entire series.
The two useful functions here are:
Series.diff
calcualtes the difference between current and n-th previous elementSeries.shift
shifts the values of a series by a specified offset
The following snippet illustrates how both functions work:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: |
// Generate sample data with 1.7 hour offsets let sample = series <| stock1 (TimeSpan.FromHours(1.7)) 6 // Calculates: new[i] = s[i] - s[i-1] let diff1 = sample |> Series.diff 1 // Diff in the opposite direction let diffM1 = sample |> Series.diff -1 // Shift series values by 1 let shift1 = sample |> Series.shift 1 // Align all results in a frame to see the results let df = [ "Shift +1" => shift1 "Diff +1" => diff1 "Diff" => sample - shift1 "Orig" => sample ] |> Frame.ofColumns val it : Frame<DateTimeOffset,string> = Diff Diff +1 Orig Shift +1 12:00:00 AM -> <missing> <missing> 21.73 <missing> 1:42:00 AM -> 1.73 1.73 23.47 21.73 3:24:00 AM -> -0.83 -0.83 22.63 23.47 5:06:00 AM -> 2.37 2.37 25.01 22.63 6:48:00 AM -> -1.57 -1.57 23.43 25.01 8:30:00 AM -> 0.09 0.09 23.52 23.43 |
In the above snippet, we first calcluate difference using the Series.diff
function.
Then we also show how to do that using Series.shift
and binary operator applied
to two series (sample - shift
). The following section provides more details.
So far, we also used the functional notation (e.g. sample |> Series.diff 1
), but
all operations can be called using the member syntax - very often, this gives you
a shorter syntax. This is also shown in the next few snippets.
Operators and functions
Time series also supports a large number of standard F# functions such as log
and abs
.
You can also use standard numerical operators to apply some operation to all elements
of the series.
Because series are indexed, we can also apply binary operators to two series. This automatically aligns the series and then applies the operation on corresponding elements.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: |
// Subtract previous value from the current value sample - sample.Shift(1) // Calculate logarithm of such differences log (sample - sample.Shift(1)) // Calculate square of differences sample.Diff(1) ** 2.0 // Calculate average of value and two immediate neighbors (sample.Shift(-1) + sample + sample.Shift(2)) / 3.0 // Get absolute value of differences abs (sample - sample.Shift(1)) // Get absolute value of distance from the mean abs (sample - (Series.mean sample)) |
The time series library provides a large number of functions that can be applied in this
way. These include trigonometric functions (sin
, cos
, ...), rounding functions
(round
, floor
, ceil
), exponentials and logarithms (exp
, log
, log10
) and more.
In general, whenever there is a built-in numerical F# function that can be used on
standard types, the time series library should support it too.
However, what can you do when you write a custom function to do some calculation and want to apply it to all series elements? Let's have a look:
1: 2: 3: 4: 5: 6: 7: 8: |
// Truncate value to interval [-1.0, +1.0] let adjust v = min 1.0 (max -1.0 v) // Apply adjustment to all function adjust $ sample.Diff(1) // The $ operator is a shorthand for sample.Diff(1) |> Series.mapValues adjust |
In general, the best way to apply custom functions to all values in a series is to
align the series (using either Series.join
or Series.joinAlign
) into a single series
containing tuples and then apply Series.mapValues
. The library also provides the $
operator
that simplifies the last step - f $ s
applies the function f
to all values of the series s
.
Data frame operations
Finally, many of the time series operations demonstrated above can be applied to entire data frames as well. This is particularly useful if you have data frame that contains multiple aligned time series of similar structure (for example, if you have multiple stock prices or open-high-low-close values for a given stock).
The following snippet is a quick overview of what you can do:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
/// Multiply all numeric columns by a given constant df * 0.65 // Apply function to all columns in all series let conv x = min x 20.0 df |> Frame.mapRowValues (fun os -> conv $ os.As<float>()) |> Frame.ofRows // Sum each column and divide results by a constant Frame.sum df / 6.0 // Divide sum by mean of each frame column Frame.sum df / Frame.mean df |
Full name: Timeseries.randomPrice
Generates price using geometric Brownian motion
- 'seed' specifies the seed for random number generator
- 'drift' and 'volatility' set properties of the price movement
- 'initial' and 'start' specify the initial price and date
- 'span' specifies time span between individual observations
- 'count' is the number of required values to generate
let dt = (span:TimeSpan).TotalDays / 250.0
let driftExp = (drift - 0.5 * pown volatility 2) * dt
let randExp = volatility * (sqrt dt)
((start:DateTimeOffset), initial) |> Seq.unfold (fun (dt, price) ->
let price = price * exp (driftExp + randExp * dist.Sample())
Some((dt, price), (dt + span, price))) |> Seq.take count
Full name: Timeseries.today
type DateTimeOffset =
struct
new : dateTime:DateTime -> DateTimeOffset + 5 overloads
member Add : timeSpan:TimeSpan -> DateTimeOffset
member AddDays : days:float -> DateTimeOffset
member AddHours : hours:float -> DateTimeOffset
member AddMilliseconds : milliseconds:float -> DateTimeOffset
member AddMinutes : minutes:float -> DateTimeOffset
member AddMonths : months:int -> DateTimeOffset
member AddSeconds : seconds:float -> DateTimeOffset
member AddTicks : ticks:int64 -> DateTimeOffset
member AddYears : years:int -> DateTimeOffset
...
end
Full name: System.DateTimeOffset
--------------------
DateTimeOffset()
DateTimeOffset(dateTime: DateTime) : unit
DateTimeOffset(ticks: int64, offset: TimeSpan) : unit
DateTimeOffset(dateTime: DateTime, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, offset: TimeSpan) : unit
DateTimeOffset(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, calendar: Globalization.Calendar, offset: TimeSpan) : unit
type DateTime =
struct
new : ticks:int64 -> DateTime + 10 overloads
member Add : value:TimeSpan -> DateTime
member AddDays : value:float -> DateTime
member AddHours : value:float -> DateTime
member AddMilliseconds : value:float -> DateTime
member AddMinutes : value:float -> DateTime
member AddMonths : months:int -> DateTime
member AddSeconds : value:float -> DateTime
member AddTicks : value:int64 -> DateTime
member AddYears : value:int -> DateTime
...
end
Full name: System.DateTime
--------------------
DateTime()
(+0 other overloads)
DateTime(ticks: int64) : unit
(+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
(+0 other overloads)
Full name: Timeseries.stock1
Full name: Timeseries.stock2
static member Area : data:seq<#value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
static member Area : data:seq<#key * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
static member Bar : data:seq<#value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
static member Bar : data:seq<#key * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> GenericChart
static member BoxPlotFromData : data:seq<#key * #seq<'a2>> * ?Name:string * ?Title:string * ?Color:Color * ?XTitle:string * ?YTitle:string * ?Percentile:int * ?ShowAverage:bool * ?ShowMedian:bool * ?ShowUnusualValues:bool * ?WhiskerPercentile:int -> GenericChart (requires 'a2 :> value)
static member BoxPlotFromStatistics : data:seq<#key * #value * #value * #value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?Percentile:int * ?ShowAverage:bool * ?ShowMedian:bool * ?ShowUnusualValues:bool * ?WhiskerPercentile:int -> GenericChart
static member Bubble : data:seq<#value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?BubbleMaxSize:int * ?BubbleMinSize:int * ?BubbleScaleMax:float * ?BubbleScaleMin:float * ?UseSizeForLabel:bool -> GenericChart
static member Bubble : data:seq<#key * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string * ?BubbleMaxSize:int * ?BubbleMinSize:int * ?BubbleScaleMax:float * ?BubbleScaleMin:float * ?UseSizeForLabel:bool -> GenericChart
static member Candlestick : data:seq<#value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> CandlestickChart
static member Candlestick : data:seq<#key * #value * #value * #value * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Color * ?XTitle:string * ?YTitle:string -> CandlestickChart
...
Full name: FSharp.Charting.Chart
type TimeSpan =
struct
new : ticks:int64 -> TimeSpan + 3 overloads
member Add : ts:TimeSpan -> TimeSpan
member CompareTo : value:obj -> int + 1 overload
member Days : int
member Duration : unit -> TimeSpan
member Equals : value:obj -> bool + 1 overload
member GetHashCode : unit -> int
member Hours : int
member Milliseconds : int
member Minutes : int
...
end
Full name: System.TimeSpan
--------------------
TimeSpan()
TimeSpan(ticks: int64) : unit
TimeSpan(hours: int, minutes: int, seconds: int) : unit
TimeSpan(days: int, hours: int, minutes: int, seconds: int) : unit
TimeSpan(days: int, hours: int, minutes: int, seconds: int, milliseconds: int) : unit
static member Chart.FastLine : data:seq<#key * #value> * ?Name:string * ?Title:string * ?Labels:#seq<string> * ?Color:Drawing.Color * ?XTitle:string * ?YTitle:string -> ChartTypes.GenericChart
Full name: Timeseries.s1
Full name: Deedle.FSharpSeriesExtensions.series
Full name: Timeseries.s2
Full name: Timeseries.s3
member Series.Zip : otherSeries:Series<'K,'V2> * kind:JoinKind -> Series<'K,('V opt * 'V2 opt)>
member Series.Zip : otherSeries:Series<'K,'V2> * kind:JoinKind * lookup:Lookup -> Series<'K,('V opt * 'V2 opt)>
| Outer = 0
| Inner = 1
| Left = 2
| Right = 3
Full name: Deedle.JoinKind
| Exact = 0
| NearestGreater = 1
| NearestSmaller = 2
Full name: Deedle.Lookup
Full name: Timeseries.f1
module Frame
from Deedle
--------------------
type Frame =
static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : rows:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
static member FromRecords : values:seq<'T> -> Frame<int,string>
static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
static member FromRows : rows:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TColKey,'TRowKey> (requires equality and equality)
...
Full name: Deedle.Frame
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
private new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> -> Frame<'TRowKey,'TColumnKey>
member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
member AddSeries : column:'TColumnKey * series:seq<'V> -> unit
member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
member AddSeries : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
...
Full name: Deedle.Frame<_,_>
--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
static member Frame.ofColumns : cols:seq<'a0 * #ISeries<'K>> -> Frame<'K,'a0> (requires equality and equality)
Full name: Timeseries.f2
Full name: Timeseries.f3
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : otherFrame:Frame<'TRowKey,'TColumnKey> * kind:JoinKind -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> * kind:JoinKind -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : otherFrame:Frame<'TRowKey,'TColumnKey> * kind:JoinKind * lookup:Lookup -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> * kind:JoinKind * lookup:Lookup -> Frame<'TRowKey,'TColumnKey>
Full name: Deedle.Frame.join
Full name: Deedle.Frame.joinAlign
Full name: Timeseries.lf
module Series
from Deedle
--------------------
type Series =
static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
static member ofOptionalObservations : observations:seq<'K * OptionalValue<'a1>> -> Series<'K,'a1> (requires equality)
static member ofValues : values:seq<'a0> -> Series<int,'a0>
Full name: Deedle.FSharpSeriesExtensions.Series
--------------------
type Series<'K,'V (requires equality)> =
interface IFsiFormattable
interface ISeries<'K>
new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
member After : lowerExclusive:'K -> Series<'K,'V>
member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
member Append : otherSeries:Series<'K,'V> -> Series<'K,'V>
member AsyncMaterialize : unit -> Async<Series<'K,'V>>
...
Full name: Deedle.Series<_,_>
--------------------
new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
Full name: Deedle.Series.window
Full name: Deedle.Series.windowInto
Full name: Deedle.Series.mean
Full name: Deedle.Series.firstValue
Full name: Timeseries.lfm1
Full name: Timeseries.lfm2
Full name: Deedle.Series.windowSizeInto
| AtBeginning = 1
| AtEnding = 2
| Skip = 4
Full name: Deedle.Boundary
Full name: Timeseries.st
union case DataSegment.DataSegment: DataSegmentKind * 'T -> DataSegment<'T>
--------------------
module DataSegment
from Deedle
--------------------
type DataSegment<'T> =
| DataSegment of DataSegmentKind * 'T
override ToString : unit -> string
member Data : 'T
member Kind : DataSegmentKind
Full name: Deedle.DataSegment<_>
Full name: Deedle.DataSegment.( |Complete|Incomplete| )
type String =
new : value:char -> string + 7 overloads
member Chars : int -> char
member Clone : unit -> obj
member CompareTo : value:obj -> int + 1 overload
member Contains : value:string -> bool
member CopyTo : sourceIndex:int * destination:char[] * destinationIndex:int * count:int -> unit
member EndsWith : value:string -> bool + 2 overloads
member Equals : obj:obj -> bool + 2 overloads
member GetEnumerator : unit -> CharEnumerator
member GetHashCode : unit -> int
...
Full name: System.String
--------------------
String(value: nativeptr<char>) : unit
String(value: nativeptr<sbyte>) : unit
String(value: char []) : unit
String(c: char, count: int) : unit
String(value: nativeptr<char>, startIndex: int, length: int) : unit
String(value: nativeptr<sbyte>, startIndex: int, length: int) : unit
String(value: char [], startIndex: int, length: int) : unit
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : unit
Full name: Deedle.Series.values
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : params indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.ofSeq
Full name: Deedle.DataSegment.( |Complete|Incomplete| )
Full name: Timeseries.hourly
Full name: Deedle.Series.windowDist
Full name: Deedle.Series.windowWhile
Full name: Timeseries.hf
Full name: Deedle.Series.chunkSize
Full name: Deedle.Series.chunkDistInto
Full name: Deedle.Series.chunkWhile
Full name: Deedle.Series.pairwise
Full name: Deedle.Series.pairwiseWith
Full name: Timeseries.mf
Full name: Timeseries.keys
Full name: Deedle.Series.lookupAll
Full name: Deedle.Series.resample
| Backward = 0
| Forward = 1
Full name: Deedle.Direction
Full name: Deedle.Series.resampleInto
member Series.Resample : keys:seq<'K> * direction:Direction * valueSelector:Func<'K,Series<'K,'V>,'a2> -> Series<'K,'a2>
member Series.Resample : keys:seq<'K> * direction:Direction * valueSelector:Func<'TNewKey,Series<'K,'V>,'R> * keySelector:Func<'K,Series<'K,'V>,'TNewKey> -> Series<'TNewKey,'R> (requires equality)
Full name: Timeseries.ds
Full name: Deedle.Series.resampleEquiv
static member SeriesExtensions.ResampleEquivalence : series:Series<'K,'V> * keyProj:Func<'K,'a2> * aggregate:Func<Series<'K,'V>,'a3> -> Series<'a2,'a3> (requires equality and equality)
Full name: Timeseries.days
Full name: Timeseries.nu
Full name: Deedle.Series.indexWith
Full name: Deedle.Series.mapKeys
DateTimeOffset.Parse(input: string, formatProvider: IFormatProvider) : DateTimeOffset
DateTimeOffset.Parse(input: string, formatProvider: IFormatProvider, styles: Globalization.DateTimeStyles) : DateTimeOffset
Full name: Timeseries.sampled
Full name: Deedle.Series.resampleUniform
static member SeriesExtensions.ResampleUniform : series:Series<'K1,'V> * keyProj:Func<'K1,'K2> * nextKey:Func<'K2,'K2> * fillMode:Lookup -> Series<'K2,'V> (requires equality and comparison)
Full name: Deedle.Series.mapValues
Full name: Deedle.Series.indexOrdinally
static member Frame.ofRows : rows:Series<'a0,#ISeries<'a2>> -> Frame<'a0,'a2> (requires equality and equality)
Full name: Timeseries.pr
Full name: Deedle.Series.sampleTime
static member SeriesExtensions.Sample : series:Series<DateTimeOffset,'V> * interval:TimeSpan * dir:Direction -> Series<DateTimeOffset,'V>
static member SeriesExtensions.Sample : series:Series<DateTimeOffset,'V> * start:DateTimeOffset * interval:TimeSpan * dir:Direction -> Series<DateTimeOffset,'V>
Full name: Deedle.Series.sampleTimeInto
Full name: Deedle.Series.lastValue
Full name: Timeseries.sample
Full name: Timeseries.diff1
Full name: Deedle.Series.diff
Full name: Timeseries.diffM1
Full name: Timeseries.shift1
Full name: Deedle.Series.shift
Full name: Timeseries.df
Full name: Microsoft.FSharp.Core.Operators.log
Full name: Microsoft.FSharp.Core.Operators.abs
Full name: Timeseries.adjust
Full name: Microsoft.FSharp.Core.Operators.min
Full name: Microsoft.FSharp.Core.Operators.max
Full name: Timeseries.conv
Multiply all numeric columns by a given constant
Full name: Deedle.Frame.mapRowValues
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
Full name: Deedle.Frame.sum
Full name: Deedle.Frame.mean