Creating lazily loaded series
When loading data from an external data source (such as a database), you might want to create a virtual time series that represents the data source, but does not actually load the data until needed. If you apply some range restriction (like slicing) to the data series before using the values, then it is not necessary to load the entire data set into memory.
Deedle supports lazy loading through the DelayedSeries.FromValueLoader
method. It returns an ordinary data series of type Series<K, V>
which has a
delayed internal representation.
Creating lazy series
We will not use a real database in this tutorial, but let's say that you have the following function which loads data for a given day range:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
Using random numbers as the source in this example is not entirely correct, because it means that we will get different values each time a new sub-range of the series is required - but it will suffice for the demonstration.
Now, to create a lazily loaded series, we need to open the Indices
namespace,
specify the minimal and maximal value of the series and use DelayedSeries.FromValueLoader
:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
To make the diagnostics easier, we print the required range whenever a request
is made. After running this code, you should not see any output yet.
The parameter to DelayedSeries.FromValueLoader
is a function that takes 4 arguments:
lo
andhi
specify the low and high boundaries of the range. Their type is the type of the key (e.g.DateTime
in our example)lob
andhib
are values of typeBoundaryBehavior
and can be eitherInclusive
orExclusive
. They specify whether the boundary value should be included or not.
Our sample function does not handle boundaries correctly - it always includes the
boundary (and possibly more values). This is not a problem, because the lazy loader
automatically skips over such values. But if you want, you can use lob
and hib
parameters to build a more optimal SQL query.
Using un-evaluated series
Let's now have a look at the operations that we can perform on un-evaluated series.
Any operation that actually accesses values or keys of the series (such as Series.observations
or lookup for a specific key) will force the evaluation of the series.
However, we can use range restrictions before accessing the data:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
As you can see from the output on line 9, the series obtained data for the
15 day range that we created by restricting the original series. When we requested
another value within the specified range, it was already available and it was
returned immediately. Note that janHalf
is restricted to the specified 15 day
range, so we cannot access values outside of the range. Also, when you access a single
value, entire series is loaded. The motivation is that you probably need to access
multiple values, so it is likely cheaper to load the whole series.
Another operation that can be performed on an unevaluated series is to add it to a data frame with some existing key range:
1: 2: 3: 4: 5: 6: |
|
When adding lazy series to a data frame, the series has to be evaluated (so that the values can be properly aligned) but it is first restricted to the range of the data frame. In the above example, only one month of data is loaded.
Full name: Lazysource.generate
Given a time range, generates random values for dates (at 12:00 AM)
starting with the day of the first date time and ending with the
day after the second date time (to make sure they are in range)
type DateTime =
struct
new : ticks:int64 -> DateTime + 10 overloads
member Add : value:TimeSpan -> DateTime
member AddDays : value:float -> DateTime
member AddHours : value:float -> DateTime
member AddMilliseconds : value:float -> DateTime
member AddMinutes : value:float -> DateTime
member AddMonths : months:int -> DateTime
member AddSeconds : value:float -> DateTime
member AddTicks : value:int64 -> DateTime
member AddYears : value:int -> DateTime
...
end
Full name: System.DateTime
--------------------
DateTime()
(+0 other overloads)
DateTime(ticks: int64) : unit
(+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
(+0 other overloads)
type Random =
new : unit -> Random + 1 overload
member Next : unit -> int + 2 overloads
member NextBytes : buffer:byte[] -> unit
member NextDouble : unit -> float
Full name: System.Random
--------------------
Random() : unit
Random(Seed: int) : unit
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
val seq : sequence:seq<'T> -> seq<'T>
Full name: Microsoft.FSharp.Core.Operators.seq
--------------------
type seq<'T> = Collections.Generic.IEnumerable<'T>
Full name: Microsoft.FSharp.Collections.seq<_>
active recognizer KeyValue: Collections.Generic.KeyValuePair<'Key,'Value> -> 'Key * 'Value
Full name: Microsoft.FSharp.Core.Operators.( |KeyValue| )
--------------------
type KeyValue =
static member Create : key:'K * value:'V -> KeyValuePair<'K,'V>
Full name: Deedle.KeyValue
val float : value:'T -> float (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.float
--------------------
type float = Double
Full name: Microsoft.FSharp.Core.float
--------------------
type float<'Measure> = float
Full name: Microsoft.FSharp.Core.float<_>
Random.Next(maxValue: int) : int
Random.Next(minValue: int, maxValue: int) : int
Full name: Lazysource.min
Full name: Lazysource.max
Full name: Lazysource.ls
static member Create : min:'a0 * max:'a0 * loader:Func<'a0,BoundaryBehavior,'a0,BoundaryBehavior,Task<seq<KeyValuePair<'a0,'a1>>>> -> Series<'a0,'a1> (requires comparison)
static member Create : min:'a0 * max:'a0 * loader:('a0 * BoundaryBehavior -> 'a0 * BoundaryBehavior -> Async<seq<KeyValuePair<'a0,'a1>>>) -> Series<'a0,'a1> (requires comparison)
static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<IIndex<'K> * IVector<'V>>) -> Series<'K,'V> (requires equality)
static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<IIndex<'K> * IVector<'V>>> -> Series<'K,'V> (requires equality)
static member FromValueLoader : min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<seq<KeyValuePair<'K,'V>>>) -> Series<'K,'V> (requires comparison)
static member FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<seq<KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)
Full name: Deedle.DelayedSeries
static member DelayedSeries.FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Threading.Tasks.Task<seq<Collections.Generic.KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: Lazysource.jan12
Full name: Lazysource.janHalf
Full name: Lazysource.dec11
module Frame
from Deedle
--------------------
type Frame =
static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
static member FromArray2D : array:'T [,] -> Frame<int,int>
static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
static member FromColumns : cols:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
static member FromRecords : values:seq<'T> -> Frame<int,string>
static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
...
Full name: Deedle.Frame
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
member AddColumn : column:'TColumnKey * series:seq<'V> -> unit
member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
member AddColumn : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
...
Full name: Deedle.Frame<_,_>
--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>