Deedle


Deedle: Exploratory data library for .NET

Deedle is an easy to use library for data and time series manipulation and for scientific programming. It supports working with structured data frames, ordered and unordered data, as well as time series. Deedle is designed to work well for exploratory programming using F# and C# interactive console, but can be also used in efficient compiled .NET code.

The library implements a wide range of operations for data manipulation including advanced indexing and slicing, joining and aligning data, handling of missing values, grouping and aggregation, statistics and more.

Titanic survivor analysis in 20 lines

Assume we loaded Titanic data set into a data frame called titanic (the data frame has numerous columns including int Pclass and Boolean Survived). Now we can calculate the survival rates for three different classes of tickets:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
// Read Titanic data & group rows by 'Sex'
let titanic = Frame.ReadCsv(root + "titanic.csv").GroupRowsBy<int>("Pclass")

// Get 'Survived' column and count survival count per clsas
let byClass =
  titanic.GetColumn<bool>("Survived")
  |> Series.applyLevel fst (fun s ->
      // Get counts for 'True' and 'False' values of 'Survived'
      series (Seq.countBy id s.Values))
  // Create frame with 'Pclass' as row and 'Died' & 'Survived' columns
  |> Frame.ofRows 
  |> Frame.sortRowsByKey
  |> Frame.indexColsWith ["Died"; "Survived"]

// Add column with Total number of males/females on Titanic
byClass?Total <- byClass?Died + byClass?Survived

// Build a data frame with nice summary of rates in percents
frame [ "Died (%)" => round (byClass?Died / byClass?Total * 100.0)
        "Survived (%)" => round (byClass?Survived / byClass?Total * 100.0) ]

Died (%)

Survived (%)

1

37

63

2

53

47

3

76

24

We first group data by the Pclass and get the Survived column as a series of Boolean values. Then we reduce each group using applyLevel. This calls a specified function for each passenger class. We count the number of survivors and casualties. Then we add nice namings, sort the frame and build a new data frame with a nice summary:

How to get Deedle

Samples & documentation

The library comes with comprehensible documentation. The tutorials and articles are automatically generated from *.fsx files in the docs folder. The API reference is automatically generated from Markdown comments in the library implementation.

  • Quick start tutorial shows how to use the most important features of F# DataFrame library. Start here to learn how to use the library in 10 minutes.

  • Data frame features provides more examples that use general data frame features. These includes slicing, joining, grouping, aggregation.

  • Series features provides more details on operations that are relevant when working with time series data (such as stock prices). This includes sliding windows, chunking, sampling and statistics.

  • Calculating frame and series statistics shows how to calculate statistical indicators such as mean, variance, skweness and other. The tutorial also covers moving window and expanding window statistics.

  • The Deedle library can be used from both F# and C#. We aim to provide idiomatic API for both of the languages. Read the using Deedle from C# page for more information about the C#-friednly API.

Automatically generated documentation for all types, modules and functions in the library is available in the API Reference. The three most important modules that are fully documented are the following:

  • Series module provides functions for working with individual data series and time-series values.
  • Frame module provides functions that are similar to those in the Series module, but operate on entire data frames.
  • Stats module implements standard statistical functions, moving windows and a lot more. It contains functions for both series and frames.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding new public API, please also consider adding samples that can be turned into a documentation. You might also want to read library design notes to understand how it works.

If you are interested in F# and data science more generally, consider also joining the F# data science and machine learning working group, which coordinates work on data science projects for F#.

The library has been developed by BlueMountain Capital and contributors. It is available under the BSD license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.

namespace System
namespace Deedle
val root : string

Full name: Index.root
val titanic : Frame<(int * int),string>

Full name: Index.titanic
Multiple items
module Frame

from Deedle

--------------------
type Frame =
  static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
  static member FromArray2D : array:'T [,] -> Frame<int,int>
  static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : cols:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
  static member FromRecords : values:seq<'T> -> Frame<int,string>
  static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
  static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
  ...

Full name: Deedle.Frame

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface INotifyCollectionChanged
  interface IFsiFormattable
  interface IFrame
  new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
  new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> -> unit
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
  ...

Full name: Deedle.Frame<_,_>

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:Indices.IIndex<'TRowKey> * columnIndex:Indices.IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:Indices.IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
static member Frame.ReadCsv : path:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int -> Frame<int,string>
static member Frame.ReadCsv : stream:IO.Stream * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int -> Frame<int,string>
static member Frame.ReadCsv : reader:IO.TextReader * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int -> Frame<int,string>
static member Frame.ReadCsv : path:string * indexCol:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int -> Frame<'R,string> (requires equality)
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
val byClass : Frame<int,string>

Full name: Index.byClass
member Frame.GetColumn : column:'TColumnKey -> Series<'TRowKey,'R>
member Frame.GetColumn : column:'TColumnKey * lookup:Lookup -> Series<'TRowKey,'R>
type bool = Boolean

Full name: Microsoft.FSharp.Core.bool
Multiple items
module Series

from Deedle

--------------------
type Series =
  static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
  static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
  static member ofOptionalObservations : observations:seq<'K * 'a1 option> -> Series<'K,'a1> (requires equality)
  static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.FSharpSeriesExtensions.Series

--------------------
type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member After : lowerExclusive:'K -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>
  member Before : upperExclusive:'K -> Series<'K,'V>
  ...

Full name: Deedle.Series<_,_>

--------------------
new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val applyLevel : level:('K1 -> 'K2) -> op:(Series<'K1,'V> -> 'R) -> series:Series<'K1,'V> -> Series<'K2,'R> (requires equality and equality)

Full name: Deedle.Series.applyLevel
val fst : tuple:('T1 * 'T2) -> 'T1

Full name: Microsoft.FSharp.Core.Operators.fst
val s : Series<(int * int),bool>
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.FSharpSeriesExtensions.series
module Seq

from Microsoft.FSharp.Collections
val countBy : projection:('T -> 'Key) -> source:seq<'T> -> seq<'Key * int> (requires equality)

Full name: Microsoft.FSharp.Collections.Seq.countBy
val id : x:'T -> 'T

Full name: Microsoft.FSharp.Core.Operators.id
property Series.Values: seq<bool>
static member Frame.ofRows : rows:seq<'a0 * #ISeries<'a2>> -> Frame<'a0,'a2> (requires equality and equality)
static member Frame.ofRows : rows:Series<'a0,#ISeries<'a2>> -> Frame<'a0,'a2> (requires equality and equality)
val sortRowsByKey : frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)

Full name: Deedle.Frame.sortRowsByKey
val indexColsWith : keys:seq<'C2> -> frame:Frame<'R,'C1> -> Frame<'R,'C2> (requires equality and equality and equality)

Full name: Deedle.Frame.indexColsWith
val frame : columns:seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)

Full name: Deedle.FSharpFrameExtensions.frame
val round : value:'T -> 'T (requires member Round)

Full name: Microsoft.FSharp.Core.Operators.round
Fork me on GitHub