Deedle


Creating lazily loaded series

When loading data from an external data source (such as a database), you might want to create a virtual time series that represents the data source, but does not actually load the data until needed. If you apply some range restriction (like slicing) to the data series before using the values, then it is not necessary to load the entire data set into memory.

Deedle supports lazy loading through the DelayedSeries.FromValueLoader method. It returns an ordinary data series of type Series<K, V> which has a delayed internal representation.

Creating lazy series

We will not use a real database in this tutorial, but let's say that you have the following function which loads data for a given day range:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
open Deedle

/// Given a time range, generates random values for dates (at 12:00 AM)
/// starting with the day of the first date time and ending with the 
/// day after the second date time (to make sure they are in range)
let generate (low:DateTime) (high:DateTime) =
  let rnd = Random()
  let days = int (high.Date - low.Date).TotalDays + 1
  seq { for d in 0 .. days -> 
          KeyValue.Create(low.Date.AddDays(float d), rnd.Next()) }

Using random numbers as the source in this example is not entirely correct, because it means that we will get different values each time a new sub-range of the series is required - but it will suffice for the demonstration.

Now, to create a lazily loaded series, we need to open the Indices namespace, specify the minimal and maximal value of the series and use DelayedSeries.FromValueLoader:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
open Deedle.Indices

// Minimal and maximal values that can be loaded from the series
let min, max = DateTime(2010, 1, 1), DateTime(2013, 1, 1)

// Create a lazy series for the given range
let ls = DelayedSeries.FromValueLoader(min, max, fun (lo, lob) (hi, hib) -> async { 
    printfn "Query: %A - %A" (lo, lob) (hi, hib)
    return generate lo hi })

To make the diagnostics easier, we print the required range whenever a request is made. After running this code, you should not see any output yet. The parameter to DelayedSeries.FromValueLoader is a function that takes 4 arguments:

  • lo and hi specify the low and high boundaries of the range. Their type is the type of the key (e.g. DateTime in our example)
  • lob and hib are values of type BoundaryBehavior and can be either Inclusive or Exclusive. They specify whether the boundary value should be included or not.

Our sample function does not handle boundaries correctly - it always includes the boundary (and possibly more values). This is not a problem, because the lazy loader automatically skips over such values. But if you want, you can use lob and hib parameters to build a more optimal SQL query.

Using un-evaluated series

Let's now have a look at the operations that we can perform on un-evaluated series. Any operation that actually accesses values or keys of the series (such as Series.observations or lookup for a specific key) will force the evaluation of the series.

However, we can use range restrictions before accessing the data:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
// Get series representing January 2012
let jan12 = ls.[DateTime(2012, 1, 1) .. DateTime(2012, 2, 1)]

// Further restriction - only first half of the month
let janHalf = jan12.[.. DateTime(2012, 1, 15)]

// Get value for a specific date
janHalf.[DateTime(2012, 1, 1)]
 Query: (1/1/2012, Inclusive) - (1/15/2012, Inclusive)
 val it : int = 1127670994

janHalf.[DateTime(2012, 1, 2)]
 val it : int = 560920727

As you can see from the output on line 9, the series obtained data for the 15 day range that we created by restricting the original series. When we requested another value within the specified range, it was already available and it was returned immediately. Note that janHalf is restricted to the specified 15 day range, so we cannot access values outside of the range. Also, when you access a single value, entire series is loaded. The motivation is that you probably need to access multiple values, so it is likely cheaper to load the whole series.

Another operation that can be performed on an unevaluated series is to add it to a data frame with some existing key range:

1: 
2: 
3: 
4: 
5: 
6: 
// Create empty data frame for days of December 2011
let dec11 = Frame.ofRowKeys [ for d in 1 .. 31 -> DateTime(2011, 12, d) ]

// Add series as the 'Values' column to the data frame
dec11?Values <- ls
 Query: (12/1/2011, Inclusive) - (12/31/2011, Inclusive)

When adding lazy series to a data frame, the series has to be evaluated (so that the values can be properly aligned) but it is first restricted to the range of the data frame. In the above example, only one month of data is loaded.

namespace System
namespace Deedle
val generate : low:DateTime -> high:DateTime -> seq<Collections.Generic.KeyValuePair<DateTime,int>>

Full name: Lazysource.generate


 Given a time range, generates random values for dates (at 12:00 AM)
 starting with the day of the first date time and ending with the
 day after the second date time (to make sure they are in range)
val low : DateTime
Multiple items
type DateTime =
  struct
    new : ticks:int64 -> DateTime + 10 overloads
    member Add : value:TimeSpan -> DateTime
    member AddDays : value:float -> DateTime
    member AddHours : value:float -> DateTime
    member AddMilliseconds : value:float -> DateTime
    member AddMinutes : value:float -> DateTime
    member AddMonths : months:int -> DateTime
    member AddSeconds : value:float -> DateTime
    member AddTicks : value:int64 -> DateTime
    member AddYears : value:int -> DateTime
    ...
  end

Full name: System.DateTime

--------------------
DateTime()
   (+0 other overloads)
DateTime(ticks: int64) : unit
   (+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
   (+0 other overloads)
val high : DateTime
val rnd : Random
Multiple items
type Random =
  new : unit -> Random + 1 overload
  member Next : unit -> int + 2 overloads
  member NextBytes : buffer:byte[] -> unit
  member NextDouble : unit -> float

Full name: System.Random

--------------------
Random() : unit
Random(Seed: int) : unit
val days : int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
property DateTime.Date: DateTime
Multiple items
val seq : sequence:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

--------------------
type seq<'T> = Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>
val d : int
Multiple items
active recognizer KeyValue: Collections.Generic.KeyValuePair<'Key,'Value> -> 'Key * 'Value

Full name: Microsoft.FSharp.Core.Operators.( |KeyValue| )

--------------------
type KeyValue =
  static member Create : key:'K * value:'V -> KeyValuePair<'K,'V>

Full name: Deedle.KeyValue
static member KeyValue.Create : key:'K * value:'V -> Collections.Generic.KeyValuePair<'K,'V>
DateTime.AddDays(value: float) : DateTime
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
Random.Next() : int
Random.Next(maxValue: int) : int
Random.Next(minValue: int, maxValue: int) : int
namespace Deedle.Indices
val min : DateTime

Full name: Lazysource.min
val max : DateTime

Full name: Lazysource.max
val ls : Series<DateTime,int>

Full name: Lazysource.ls
type DelayedSeries =
  static member Create : min:'a0 * max:'a0 * loader:Func<'a0,BoundaryBehavior,'a0,BoundaryBehavior,Task<seq<KeyValuePair<'a0,'a1>>>> -> Series<'a0,'a1> (requires comparison)
  static member Create : min:'a0 * max:'a0 * loader:('a0 * BoundaryBehavior -> 'a0 * BoundaryBehavior -> Async<seq<KeyValuePair<'a0,'a1>>>) -> Series<'a0,'a1> (requires comparison)
  static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<IIndex<'K> * IVector<'V>>) -> Series<'K,'V> (requires equality)
  static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<IIndex<'K> * IVector<'V>>> -> Series<'K,'V> (requires equality)
  static member FromValueLoader : min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<seq<KeyValuePair<'K,'V>>>) -> Series<'K,'V> (requires comparison)
  static member FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<seq<KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)

Full name: Deedle.DelayedSeries
static member DelayedSeries.FromValueLoader : min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<seq<Collections.Generic.KeyValuePair<'K,'V>>>) -> Series<'K,'V> (requires comparison)
static member DelayedSeries.FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Threading.Tasks.Task<seq<Collections.Generic.KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)
val lo : DateTime
val lob : BoundaryBehavior
val hi : DateTime
val hib : BoundaryBehavior
val async : AsyncBuilder

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
val jan12 : Series<DateTime,int>

Full name: Lazysource.jan12
val janHalf : Series<DateTime,int>

Full name: Lazysource.janHalf
val dec11 : Frame<DateTime,string>

Full name: Lazysource.dec11
Multiple items
module Frame

from Deedle

--------------------
type Frame =
  static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
  static member FromArray2D : array:'T [,] -> Frame<int,int>
  static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : cols:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
  static member FromRecords : values:seq<'T> -> Frame<int,string>
  static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
  static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
  ...

Full name: Deedle.Frame

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface INotifyCollectionChanged
  interface IFsiFormattable
  interface IFrame
  new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
  new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> -> unit
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
  ...

Full name: Deedle.Frame<_,_>

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
static member Frame.ofRowKeys : keys:seq<'R> -> Frame<'R,string> (requires equality)
Fork me on GitHub