Deedle


Deedle

Deedle Namespace

Core frame and series types

TypeDescription
Frame<'TRowKey, 'TColumnKey>

A frame is the key Deedle data structure (together with series). It represents a data table (think spreadsheet or CSV file) with multiple rows and columns. The frame consists of row index, column index and data. The indices are used for efficient lookup when accessing data by the row key 'TRowKey or by the column key 'TColumnKey. Deedle frames are optimized for the scenario when all values in a given column are of the same type (but types of different columns can differ).

FrameData

Represents the underlying (raw) data of the frame in a format that can be used for exporting data frame to other formats etc. (DataTable, CSV, Excel)

ISeries<'K>

Represents an untyped series with keys of type K and values of some unknown type (This type should not generally be used directly, but it can be used when you need to write code that works on a sequence of series of heterogeneous types).

Series<'K, 'V>

The type Series<K, V> represents a data series consisting of values V indexed by keys K. The keys of a series may or may not be ordered

ModuleDescription
F# Series extensions

Contains extensions for creating values of type Series<'K, 'V> including a type with functions such as Series.ofValues and the series function. The module is automatically opened for all F# code that references Deedle.

Frame and series operations

TypeDescription
EnumerableExtensions

Contains C#-friendly extension methods for various instances of IEnumerable that can be used for creating Series<'K, 'V> from the IEnumerable value. You can create an ordinal series from IEnumerable<'T> or an indexed series from IEnumerable<KeyValuePair<'K, 'V>> or from IEnumerable<KeyValuePair<'K, OptionalValue<'V>>>.

Frame

Provides static methods for creating frames, reading frame data from CSV files and database (via IDataReader). The type also provides global configuration for reflection-based expansion.

FrameExtensions

Contains C# and F# extension methods for the Frame<'R, 'C> type. The members are automatically available when you import the Deedle namespace. The type contains object-oriented counterparts to most of the functionality from the Frame module.

SeriesExtensions

The type implements C# and F# extension methods for the Series<'K, 'V> type. The members are automatically available when you import the Deedle namespace. The type contains object-oriented counterparts to most of the functionality from the Series module.

SeriesStatsExtensions

The type implements C# and F# extension methods that add numerical operations to Deedle series. With a few exceptions, the methods are only available for series containing floating-point values, that is Series<'K, float>.

Stats

The Stats type contains functions for fast calculation of statistics over series and frames as well as over a moving and an expanding window in a series.

The resulting series has the same keys as the input series. When there are no values, or missing values, different functions behave in different ways. Statistics (e.g. mean) return missing value when any value is missing, while min/max functions return the minimal/maximal element (skipping over missing values).

ModuleDescription
F# Frame extensions

This module contains F# functions and extensions for working with frames. This includes operations for creating frames such as the frame function, => operator and Frame.ofRows, Frame.ofColumns and Frame.ofRowKeys functions. The module also provides additional F# extension methods including ReadCsv, SaveCsv and PivotTable.

Frame

The Frame module provides an F#-friendly API for working with data frames. The module follows the usual desing for collection-processing in F#, so the functions work well with the pipelining operator (|>). For example, given a frame with two columns representing prices, we can use Frame.diff and numerical operators to calculate daily returns like this:

1: 
2: 
3: 
4: 
let df = frame [ "MSFT" => prices1; "AAPL" => prices2 ]
let past = df |> Frame.diff 1
let rets = past / df * 100.0
rets |> Stats.mean

Note that the Stats.mean operation is overloaded and works both on series (returning a number) and on frames (returning a series).

The functions in this module are designed to be used from F#. For a C#-friendly API, see the FrameExtensions type. For working with individual series, see the Series module. The functions in the Frame module are grouped in a number of categories and documented below.

val df : float

Full name: docs.df
val past : float

Full name: docs.past
val rets : float

Full name: docs.rets
FrameBuilder

Type that can be used for creating frames using the C# collection initializer syntax. You can use new FrameBuilder.Columns<...> to create a new frame from columns or you can use new FrameBuilder.Rows<...> to create a new frame from rows.

Series

The Series module provides an F#-friendly API for working with data and time series. The API follows the usual design for collection-processing in F#, so the functions work well with the pipelining (|>) operator. For example, given a series with ages, we can use Series.filterValues to filter outliers and then Stats.mean to calculate the mean:

1: 
2: 
3: 
ages
|> Series.filterValues (fun v -> v > 0.0 && v < 120.0)
|> Stats.mean

The module provides comprehensive set of functions for working with series. The same API is also exposed using C#-friendly extension methods. In C#, the above snippet could be written as:

1: 
2: 
3: 
ages
  .Where(kvp => kvp.Value > 0.0 && kvp.Value < 120.0)
  .Mean()

For more information about similar frame-manipulation functions, see the Frame module. For more information about C#-friendly extensions, see SeriesExtensions. The functions in the Series module are grouped in a number of categories and documented below.

Parameters and results of various operations

TypeDescription
Aggregation<'K>

Represents a strategy for aggregating data in an ordered series into data segments. To create a value of this type from C#, use the non-generic Aggregation type. Data can be aggregate using floating windows or chunks of a specified size or by specifying a condition on two keys (i.e. end a window/chunk when the condition no longer holds).

Aggregation

A non-generic type that simplifies the construction of Aggregation<K> values from C#. It provides methods for constructing different kinds of aggregation strategies for ordered series.

Boundary

Represents boundary behaviour for operations such as floating window. The type specifies whether incomplete windows (of smaller than required length) should be produced at the beginning (AtBeginning) or at the end (AtEnding) or skipped (Skip). For chunking, combinations are allowed too - to skip incomplete chunk at the beginning, use Boundary.Skip ||| Boundary.AtBeginning.

ConversionKind

Represents different kinds of type conversions that can be used by Deedle internally. This is used, for example, when converting ObjectSeries<'K> to Series<'K, 'T> - The conversion kind can be specified as an argument to allow certain conversions.

DataSegment<'T>

Represents a segment of a series or sequence. The value is returned from various functions that aggregate data into chunks or floating windows. The Complete case represents complete segment (e.g. of the specified size) and Boundary represents segment at the boundary (e.g. smaller than the required size).

DataSegmentKind

Represents a kind of DataSegment<T>. See that type for more information.

Direction

Specifies in which direction should we look when performing operations such as Series.Pairwise.

ICustomLookup<'K>

Represents a special lookup. This can be used to support hierarchical or duplicate keys in an index. A key type K can come with associated ICustomLookup<K> to provide customized pattern matching (equality testing)

JoinKind

This enumeration specifies joining behavior for Join method provided by Series and Frame. Outer join unions the keys (and may introduce missing values), inner join takes the intersection of keys; left and right joins take the keys of the first or the second series/frame.

Lookup

Represents different behaviors of key lookup in series. For unordered series, the only available option is Lookup.Exact which finds the exact key - methods fail or return missing value if the key is not available in the index. For ordered series Lookup.Greater finds the first greater key (e.g. later date) with a value. Lookup.Smaller searches for the first smaller key. The options Lookup.ExactOrGreater and Lookup.ExactOrSmaller finds the exact key (if it is present) and otherwise search for the nearest larger or smaller key, respectively.

UnionBehavior

This enumeration specifies the behavior of Union operation on series when there are overlapping keys in two series that are being unioned. The options include preferring values from the left/right series or throwing an exception when both values are available.

ModuleDescription
DataSegment

Provides helper functions and active patterns for working with DataSegment values

MultiKeyExtensions

F#-friendly functions for creating multi-level keys and lookups

Primitive types and values

TypeDescription
KeyValue

A type with extension method for KeyValuePair<'K, 'V> that makes it possible to create values using just KeyValue.Create.

MissingValueException

Thrown when a value at the specified index does not exist in the data frame or series. This exception is thrown only when the key is defined, but the value is not available, in other situations KeyNotFoundException is thrown

OptionalValue<'T>

Value type that represents a potentially missing value. This is similar to System.Nullable<T>, but does not restrict the contained value to be a value type, so it can be used for storing values of any types. When obtained from DataFrame<R, C> or Series<K, T>, the Value will never be Double.NaN or null (but this is not, in general, checked when constructing the value).

The type is only used in C#-friendly API. F# operations generally use expose standard F# option<T> type instead. However, there the OptionalValue module contains helper functions for using this type from F# as well as Missing and Present active patterns.

OptionalValue

Non-generic type that makes it easier to create OptionalValue<T> values from C# by benefiting the type inference for generic method invocations.

OptionalValueExtensions

Extension methods for working with optional values from C#. These make it easier to provide default values and convert optional values to Nullable (when the contained value is value type)

TryValue<'T>

Represents a value or an exception. This type is used by functions such as Series.tryMap and Frame.tryMap to capture the result of a lambda function, which may be either a value or an exception. The type is a discriminated union, so it can be processed using F# pattern matching, or using Value, HasValue and Exception properties

opt<'T>

A type alias for the OptionalValue<T> type. The type alias can be used to make F# type definitions that use optional values directly more succinct.

tryval<'T>

A type alias for the TryValue<T> type. The type alias can be used to make F# type declarations that explcitly handle exceptions more succinct.

ModuleDescription
OptionalValue

Provides various helper functions for using the OptionalValue<T> type from F# (The functions are similar to those in the standard Option module).

Pair

Module with helper functions for extracting values from hierarchical tuples

Specialized frame and series types

TypeDescription
ColumnSeries<'TRowKey, 'TColumnKey>

Represents a series of columns from a frame. The type inherits from a series of series representing individual columns (Series<'TColumnKey, ObjectSeries<'TRowKey>>) but hides slicing operations with new versions that return frames.

DelayedSeries

This type exposes a single static method DelayedSeries.Create that can be used for constructing data series (of type Series<K, V>) with lazily loaded data. You can use this functionality to create series that represents e.g. an entire price history in a database, but only loads data that are actually needed. For more information see the lazy data loading tutorial.

IFrame

An empty interface that is implemented by Frame<'R, 'C>. The purpose of the interface is to allow writing code that works on arbitrary data frames (you need to provide an implementation of the IFrameOperation<'V> which contains a generic method Invoke that will be called with the typed data frame).

IFrameOperation<'V>

Represents an operation that can be invoked on Frame<'R, 'C>. The operation is generic in the type of row and column keys.

ObjectSeries<'K>

Represents a series containing boxed values. This type is inherited from Series<'K, obj> and it adds additional operations for accessing values with unboxing. This includes operations such as os.GetAs<'T>, os.TryGetAs<'T> and os.TryAs<'T> which (attempt to) convert values to the specified type 'T.

RowSeries<'TRowKey, 'TColumnKey>

Represents a series of rows from a frame. The type inherits from a series of series representing individual rows (Series<'TRowKey, ObjectSeries<'TColumnKey>>) but hides slicing operations with new versions that return frames.

SeriesBuilder<'K, 'V>

The type can be used for creating series using mutation. You can add items using Add and get the resulting series using the Series property.

SeriesBuilder<'K>

A simple class that inherits from SeriesBuilder<'K, obj> and can be used instead of writing SeriesBuilder<'K, obj> with two type arguments.

Vectors and indices

TypeDescription
IVector<'T>

A generic, typed vector. Represents mapping from addresses to values of type T. The vector provides a minimal interface that is required by series and can be implemented in a number of ways to provide vector backed by database or an alternative representation of data.

IVector

Represents an (untyped) vector that stores some values and provides access to the values via a generic address. This type should be only used directly when extending the DataFrame library and adding a new way of storing or loading data. To allow invocation via Reflection, the vector exposes type of elements as System.Type.

IVectorLocation

Represents a location in a vector. In general, we always know the address, but sometimes (BigDeedle) it is hard to get the offset (requires some data lookups), so we use this interface to delay the calculation of the Offset (which is mainly needed in one of the series.Select overloads)

Index

Type that provides access to creating indices (represented as LinearIndex values)

VectorCallSite<'R>

Represents a generic function \forall.'T.(IVector<'T> -> 'R). The function can be generically invoked on an argument of type IVector using IVector.Invoke

ModuleDescription
Addressing

An Address value is used as an interface between vectors and indices. The index maps keys of various types to address, which is then used to get a value from the vector.

Here is a brief summary of what we assume (and don't assume) about addresses:

  • Address is int64 (although we might need to generalize this in the future)
  • Different data sources can use different addressing schemes (as long as both index and vector use the same scheme)
  • Addresses don't have to be continuous (e.g. if the source is partitioned, it can use 32bit partition index + 32bit offset in the partition)
  • In the in-memory representation, address is just index into an array
  • In the BigDeedle representation, address is abstracted and comes with AddressOperations that specifies how to use it (tests use linear offset and partitioned representation)
F# Index extensions

Defines non-generic Index type that provides functions for building indices (hard-bound to LinearIndexBuilder type). In F#, the module is automatically opened using AutoOpen. The methods are not designed for the use from C#.

F# IndexBuilder implementation

Set concrete IIndexBuilder implementation

F# Vector extensions

Defines non-generic Vector type that provides functions for building vectors (hard-bound to ArrayVectorBuilder type). In F#, the module is automatically opened using AutoOpen. The methods are not designed for the use from C#.

F# Vector extensions (core)

Module with extensions for generic vector type. Given vec of type IVector<T>, the extension property vec.DataSequence returns all data of the vector converted to the "least common denominator" data structure - IEnumerable<T>.

F# VectorBuilder implementation

Set concrete IVectorBuilder implementation

Other namespace members

TypeDescription
IRangeRestriction<'TAddress>

A sequence of indicies together with the total number. Use RangeRestriction.ofSeq to create one from a sequence. This can be implemented by concrete vector/index builders to allow further optimizations (e.g. when the underlying source directly supports range operations).

For example, if your source has an optimised way for getting every 10th address, you can create your own IRangeRestriction and then check for it in LookupRange and use optimised implementation rather than actually iterating over the sequence of indices.

RangeRestriction<'TAddress>

Specifies a sub-range within index that can be accessed via slicing (see the GetAddressRange method). For in-memory data structures, accessing range via known addresses is typically sufficient, but for virtual Big Deedle sources, Start and End let us avoid fully evaluating addresses. Custom range can be used for optimizations.

ModuleDescription
RangeRestriction

Provides additional operations for working with the RangeRestriction<'TAddress> type

Deedle.Indices Namespace

TypeDescription
AsyncSeriesConstruction<'K>

Asynchronous version of SeriesConstruction<'K>. Returns a workflow that evaluates the index, together with a construction to apply (asynchronously) on vectors

BoundaryBehavior

Specifies the boundary behavior for the IIndexBuilder.GetRange operation (whether the boundary elements should be included or not)

IIndex<'K>

An interface that represents index mapping keys of some generic type T to locations of address Address. The IIndex<K> contains minimal set of operations that have to be supported by an index. This type should be only used directly when extending the DataFrame library and adding a new way of storing or loading data. Values of this type are constructed using the associated IIndexBuilder type.

IIndexBuilder

A builder represents various ways of constructing index, either from keys or from other indices. The operations that build a new index from an existing index also build VectorConstruction which specifies how to transform vectors aligned with the previous index to match the new index. The methods generally take VectorConstruction as an input, apply necessary transformations to it and return a new VectorConstruction.

SeriesConstruction<'K>

Represents a pair of index and vector construction (many of the index operations take/return an index together with a construction command that builds a vector matching with the index, so this type alias makes this more obvious)

Deedle.Indices.Linear Namespace

TypeDescription
LinearAddressOperations

Implements address operations for linear addressing

LinearIndex<'K>

An index that maps keys K to offsets Address. The keys cannot be duplicated. The construction checks if the keys are ordered (using the provided or the default comparer for K) and disallows certain operations on unordered indices.

LinearIndexBuilder

Index builder object that is associated with LinearIndex<K> type. The builder provides operations for manipulating linear indices (and the associated vectors).

LinearRangeIndex<'K>

A virtual index that represents a subrange of a specified index. This is useful for windowing operations where we do not want to allocate a new index for each window. This index can be cheaply constructed and it implements many of the standard functions without actually allocating the index (e.g. KeyCount, KeyAt, IsEmpty). For more complex index manipulations (including lookup), an actual index is constructed lazily.

Deedle.Indices.Virtual Namespace

TypeDescription
VirtualIndexBuilder

Implements IIndexBuilder interface for BigDeedle. This directly implements operations that can be implemented on virtual vectors (mostly merging, slicing) and for other operations, it calls ordinary LinearIndexBuilder. The resulting VectorConstruction corresponds to the addressing scheme of the returned index (i.e. if we return virtual, we expect to build virtual vector; if we materialize, the vector builder also has to materialize).

VirtualOrderedIndex<'K>

Represents an ordered index based on data provided by a virtual source. The index can be used by BigDeedle virtual frames and series, without accessing all data from the data source.

The index only evaluates the full key collection when needed. Most of the actual work is delegated to the IVirtualVectorSource<'K> value passed in the constructor.

VirtualOrdinalIndex

Represents an ordinal index based on addressing provided by a virtual source. The index can be used by BigDeedle virtual frames and series, without accessing all data from the data source.

Deedle.Internal Namespace

ModuleDescription
Array

This module contains additional functions for working with arrays. Deedle.Internals is opened, it extends the standard Array module.

List

This module contains additional functions for working with lists.

MissingValues

Utility functions for identifying missing values. The isNA function can be used to test whether a value represents a missing value - this includes the null value, Nullable<T> value with HasValue = false and Single.NaN as well as Double.NaN.

The functions in this module are not intended to be called directly.

ReadOnlyCollection

Provides helper functions for working with ReadOnlyCollection<T> similar to those in the Array module. Most importantly, F# 3.0 does not know that array implements IList<T>.

Seq

This module contains additional functions for working with sequences. Deedle.Internals is opened, it extends the standard Seq module.

Deedle.Keys Namespace

TypeDescription
CustomKey

Helper type that can be used to get ICustomKey for any object (including objects that actually implement the interface and tuples)

ICustomKey

Represents a special hierarchical key. This is mainly used in pretty printing (where we want to get parts of the keys based on levels. CustomKey.Get provides a way of getting ICustomKey.

SimpleLookup<'T>

Implements a simple lookup that matches any multi-level key against a specified array of optional objects (that represent missing/set parts of a key)

Deedle.Ranges Namespace

TypeDescription
IRangeKeyOperations<'TKey>

A set of operations on keys that you need to implement in order to use the Ranges<'TKey> type. The 'TKey type is typically the key of a BigDeedle series. It can represent different things, such as:

  • int64 - if you have ordinally indexed series
  • Date (of some sort) - if you have daily time series
  • DateTimeOffset - if you have time series with DTO keys

The operations need to implement the right thing based on the logic of the keys. So for example if you have one data point every hour, IncrementBy should add the appropriate number of hours. Or if you have keys as business days, the IncrementBy operation should add a number of business days (that is, the operations may be simple numerical addition, but may contain more logic).

Ranges<'T>

Represents a sub-range of an ordinal index. The range can consist of multiple blocks, i.e. [ 0..9; 20..29 ]. The pairs represent indices of first and last element (inclusively) and we also keep size so that we do not have to recalculate it.

For more information, see also the documentation for the Ranges module.

ModuleDescription
Ranges

Provides F# functions for working with the Ranges<'T> type. Note that most of the functions are also exposed as members. The terminology in the functions below is:

  • offset refers to an absolute int64 offset of a key in the range
  • key refers to a key value of type 'T

Say, you have daily range [ (2015-01-01, 2015-01-10); (2015-02-01, 2015-02-10) ]. Then, the keys are the dates and the offsets are 0 .. 9 for the first part and 10 .. 19 for the second part.

Deedle.Vectors Namespace

TypeDescription
IBinaryTransform

Represent a transformation that is applied when combining two vectors (because we are combining untyped IVector values, the transformation is also untyped)

INaryTransform

Represent a tranformation that is applied when combining N vectors (This follows exactly the same pattern as IBinaryTransform)

IRowReaderTransform

When an INaryTransform implements this interface, it is a special well-known transformation that creates a row reader vector to be used in frame.Rows. (See the implementation in the Build operation in ArrayVector.fs)

IVectorBuilder

Represents an object that can construct vector values by processing the "mini-DSL" representation VectorConstruction.

KnownLocation

An IVectorLocation created from a known address and offset (typically used in LinearIndex/ArrayVector where both are the same)

Vector

Type that provides access to creating vectors (represented as arrays)

VectorConstruction

A "mini-DSL" that describes construction of a vector. Vector can be constructed from various range operations (relocate, drop, slicing, appending), by combination of two vectors or by taking a vector from a list of variables.

Notably, vectors can only be constructed from other vectors of the same type (the Combine operation requires this - even though that one could be made more general). This is an intentional choice to make the representation simpler.

Logically, when we apply some index operation, we should get back a polymorphic vector construction (\forall T. VectorConstruction<T>) that can be applied to variuous different vector types. That would mean adding some more types, so we just model vector construction as an untyped operation and the typing is resquired by the Build method of the vector builder.

VectorData<'T>

Provides a way to get the data of an arbitrary vector. This is a concrete type used by functions that operate on vectors (like Series.sum, etc.). The vector may choose to return the data as ReadOnlyCollection (with or without N/A values) which is more efficient to use or as a lazy sequence (slower, but more general).

VectorFillMissing

Specifies how to fill missing values in a vector (when using the VectorConstruction.FillMissing command). This can only fill missing values using strategy that does not require access to index keys - either using constant or by propagating values.

VectorHole

Representes a "variable" in the mini-DSL below

VectorListTransform

A transformation on vector(s) can specified as binary or as N-ary. A binary transformation can be applied to N elements using List.reduce, but allows optimizations.

Deedle.Vectors.ArrayVector Namespace

TypeDescription
ArrayVector<'T>

Vector that stores data in an array. The data is stored using the ArrayVectorData<'T> type (discriminated union)

ArrayVectorBuilder

Implements a builder object (IVectorBuilder) for creating vectors of type ArrayVector<'T>. This includes operations such as appending, relocating values, creating vectors from arrays etc. The vector builder automatically switches between the two possible representations of the vector - when a missing value is present, it uses ArrayVectorData.VectorOptional, otherwise it uses ArrayVectorData.VectorNonOptional.

Deedle.Vectors.Virtual Namespace

TypeDescription
DelayedLocation

Represents a vector location that calculates the offset using address operations as needed (typically, we want to avoid this because it might be slow)

IVirtualVectorSource

Non-generic part of the IVirtualVectorSource<'V> interface, which provides some basic information about the virtualized data source

IVirtualVectorSource<'V>

Represents a data source for Big Deedle. The interface is used both as a representation of data source for VirtualVector (this file) and VirtualIndex (another file). The index uses Length and ValueAt to perform binary search when looking for a key; the vector simply provides an access to values using ValueAt.

IVirtualVectorSourceOperation<'R>

A helper type used by non-generic IVirtualVectorSource to invoke generic operations that require generic IVirtualVectorSource<'T> as an argument.

RangesAddressOperations<'TKey>

In BigDeedle, we often use Ranges<'T> to represent the address range obtained as a result of slicing and merging frames & series. This implements IAddressOperations for Ranges<'T>.

VirtualAddressingScheme

Represents an addressing scheme associated to virtual vectors. The addresses may be partitioned differently (for different data sources), so this carries an "id" of the data source (to make sure we don't try to mix mismatching data sources)

VirtualVector<'V>

Creates an IVector<'T> implementation that provides operations for accessing data in IVirtualVectorSource. This mostly just calls ValueAt to read data.

VirtualVectorBuilder

Implements a builder object (IVectorBuilder) for creating vectors of type VirtualVector<'T>. This can do a few things without evaluating vectors (merging, slicing). For other operations the builder needs to materialize the vector and call ArrayVectorBuilder.

ModuleDescription
VirtualVectorSource

Module that implements various helper operations over IVirtualVectorSource type

Deedle.Virtual Namespace

TypeDescription
IndexUtils

Helpers that can be used when implementing Lookup

Virtual

Provides static methods for creating virtual series and virtual frames. Those provide necessary wrapping around IVirtualVectorSource values

ModuleDescription
IndexUtilsModule
Fork me on GitHub