Deedle: Exploratory data library for .NET

Deedle is an easy to use library for data and time series manipulation and for scientific programming. It supports working with structured data frames, ordered and unordered data, as well as time series. Deedle is designed to work well for exploratory programming using F# and C# interactive console, but can be also used in efficient compiled .NET code.

The library implements a wide range of operations for data manipulation including advanced indexing and slicing, joining and aligning data, handling of missing values, grouping and aggregation, statistics and more.

The F# DataFrame library can be installed from NuGet:
PM> Install-Package Deedle

Titanic survivor analysis in 20 lines

Assume we loaded Titanic data set into a data frame called titanic (the data frame has numerous columns including string Sex and Boolean Survived). Now we can calculate the survival rates for males and females:

// Group the data frame by sex 
let grouped = titanic |> Frame.groupRowsByString "Sex" |> Frame.nest

// For each group, calculate the total number of survived & died
let bySex =
  |> (fun sex df -> 
      // Group by the 'Survived' column & count by Boolean values
      df.GetSeries<bool>("Survived") |> Series.groupBy (fun k v -> v) 
      |> Frame.ofColumns |> Frame.countValues )
  |> Frame.ofRows
  |> Frame.indexColsWith ["Died"; "Survived"]

// Add column with Total number of males/females on Titanic
bySex?Total <- Frame.countRows $ grouped

// Build a data frame with nice summary of rates in percents
[ "Died (%)" => bySex?Died / bySex?Total * 100.0
  "Survived (%)" => bySex?Survived / bySex?Total * 100.0 ]
|> Frame.ofColumns

We first group data by the Sex column and get a series (with rows male and female) containing data frames for individual groups. Then we count the number of survivors in each group, add the total number of males and females and, finally, build a new data frame with the summary:

Died (%)Survived (%)

Samples & documentation

The library comes with comprehensible documentation. The tutorials and articles are automatically generated from *.fsx files in the samples folder. The API reference is automatically generated from Markdown comments in the library implementation.

  • Quick start tutorial shows how to use the most important features of F# DataFrame library. Start here to learn how to use the library in 10 minutes.

  • Data frame features provides more examples that use general data frame features. These includes slicing, joining, grouping, aggregation.

  • Time series features provides more details on operations that are relevant when working with time series data (such as stock prices). This includes sliding windows, chunking, sampling and statistics.

  • API Reference contains automatically generated documentation for all types, modules and functions in the library. This includes additional brief samples on using most of the functions.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding new public API, please also consider adding samples that can be turned into a documentation. You might also want to read library design notes to understand how it works.

If you are interested in F# and data science more generally, consider also joining the F# data science and machine learning working group, which coordinates work on data science projects for F#.

The library has been developed by BlueMountain Capital and contributors. It is available under the BSD license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.

val grouped : Series<string,Frame<int,string>>

Full name: Index.grouped
val titanic : Frame<int,string>

Full name: Index.titanic

 Titanic data set loaded from a CSV file
Multiple items
module Frame

from Deedle

type Frame =
  static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : rows:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
  static member FromRecords : values:seq<'T> -> Frame<int,string>
  static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
  static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
  static member FromRows : rows:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TColKey,'TRowKey> (requires equality and equality)

Full name: Deedle.Frame

type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface INotifyCollectionChanged
  interface IFsiFormattable
  interface IFrame
  new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
  private new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> -> Frame<'TRowKey,'TColumnKey>
  member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
  member AddSeries : column:'TColumnKey * series:seq<'V> -> unit
  member AddSeries : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
  member AddSeries : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit

Full name: Deedle.Frame<_,_>

new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
val groupRowsByString : column:'a -> frame:Frame<'b,'a> -> Frame<(string * 'b),'a> (requires equality and equality)

Full name: Deedle.Frame.groupRowsByString
val nest : frame:Frame<('R1 * 'R2),'C> -> Series<'R1,Frame<'R2,'C>> (requires equality and equality and equality)

Full name: Deedle.Frame.nest
val bySex : Frame<string,string>

Full name: Index.bySex
Multiple items
module Series

from Deedle

type Series =
  static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
  static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
  static member ofOptionalObservations : observations:seq<'K * OptionalValue<'a1>> -> Series<'K,'a1> (requires equality)
  static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.FSharpSeriesExtensions.Series

type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member After : lowerExclusive:'K -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
  member Append : otherSeries:Series<'K,'V> -> Series<'K,'V>
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>

Full name: Deedle.Series<_,_>

new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val map : f:('K -> 'T -> 'R) -> series:Series<'K,'T> -> Series<'K,'R> (requires equality)

Full name:
val sex : string
val df : Frame<int,string>
member Frame.GetSeries : column:'TColumnKey -> Series<'TRowKey,'R>
member Frame.GetSeries : column:'TColumnKey * lookup:Lookup -> Series<'TRowKey,'R>
type bool = Boolean

Full name: Microsoft.FSharp.Core.bool
val groupBy : keySelector:('K -> 'T -> 'TNewKey) -> series:Series<'K,'T> -> Series<'TNewKey,Series<'K,'T>> (requires equality and equality)

Full name: Deedle.Series.groupBy
val k : int
val v : bool
static member Frame.ofColumns : cols:Series<'a0,#ISeries<'a2>> -> Frame<'a2,'a0> (requires equality and equality)
static member Frame.ofColumns : cols:seq<'a0 * #ISeries<'K>> -> Frame<'K,'a0> (requires equality and equality)
val countValues : frame:Frame<'R,'C> -> Series<'C,int> (requires equality and equality)

Full name: Deedle.Frame.countValues
static member Frame.ofRows : rows:seq<'a0 * #ISeries<'a2>> -> Frame<'a0,'a2> (requires equality and equality)
static member Frame.ofRows : rows:Series<'a0,#ISeries<'a2>> -> Frame<'a0,'a2> (requires equality and equality)
val indexColsWith : keys:seq<'C2> -> frame:Frame<'R,'C1> -> Frame<'R,'C2> (requires equality and equality and equality)

Full name: Deedle.Frame.indexColsWith
val countRows : frame:Frame<'R,'C> -> int (requires equality and equality)

Full name: Deedle.Frame.countRows
Fork me on GitHub