Deedle


F# Frame extensions

Namespace: Deedle

This module contains F# functions and extensions for working with frames. This includes operations for creating frames such as the frame function, => operator and Frame.ofRows, Frame.ofColumns and Frame.ofRowKeys functions. The module also provides additional F# extension methods including ReadCsv, SaveCsv and PivotTable.

Table of contents

Frame construction 

The functions and methods in this group can be used to create frames. If you are creating a frame from a number of sample values, you can use frame and the => operator (or the =?> opreator which is useful if you have multiple series of distinct types):

1: 
2: 
frame [ "Column 1" => series [ 1 => 1.0; 2 => 2.0 ]
        "Column 2" => series [ 3 => 3.0 ] ]

Aside from this, the various type extensions let you write Frame.ofXyz to construct frames from data in various formats - Frame.ofRows and Frame.ofColumns create frame from a series or a sequence of rows or columns; Frame.ofRecords creates a frame from .NET objects using Reflection and Frame.ofRowKeys creates an empty frame with the specified keys.

Functions and values

Function or valueDescription
( =?> ) a b
Signature: a:'?497402 -> b:ISeries<'?497403> -> '?497402 * ISeries<'?497403>
Type parameters: '?497402, '?497403

Custom operator that can be used when constructing a frame from observations of series. The operator simply returns a tuple, but it upcasts the series argument so you don't have to do manual casting. For example:

1: 
frame [ "k1" =?> series [0 => "a"]; "k2" =?> series ["x" => "y"] ]
( => ) a b
Signature: a:'?497399 -> b:'?497400 -> '?497399 * '?497400
Type parameters: '?497399, '?497400

Custom operator that can be used when constructing series from observations or frames from key-row or key-column pairs. The operator simply returns a tuple, but it provides a more convenient syntax. For example:

1: 
series [ "k1" => 1; "k2" => 15 ]
frame columns
Signature: columns:seq<'?497405 * '?497406> -> Frame<'?497407,'?497405>
Type parameters: '?497405, '?497406, '?497407

A function for constructing data frame from a sequence of name - column pairs. This provides a nicer syntactic sugar for Frame.ofColumns.

Example

To create a simple frame with two columns, you can write:

1: 
2: 
frame [ "A" => series [ 1 => 30.0; 2 => 35.0 ]
        "B" => series [ 1 => 30.0; 3 => 40.0 ] ]

Type extensions

Type extensionDescription
ofArray2D(array)
Signature: (array:'T [,]) -> Frame<int,int>
Type parameters: 'T

Create data frame from a 2D array of values. The first dimension of the array is used as rows and the second dimension is treated as columns. Rows and columns of the returned frame are indexed with the element's offset in the array.

Parameters

  • array - A two-dimensional array to be converted into a data frame
ofColumns(cols)
Signature: cols:Series<'C,'?497429> -> Frame<'R,'C>
Type parameters: 'C, '?497429, 'R

Creates a frame from a series that maps column keys to a nested series containing values for each column.

ofColumns(cols)
Signature: (cols:seq<'C * '?497433>) -> Frame<'R,'C>
Type parameters: 'C, '?497433, 'R

Creates a frame from a sequence of column keys and column series pairs. The column series can contain values of any type, but it has to be the same for all the series - if you have heterogenously typed series, use =?>.

ofRecords(series)
Signature: series:Series<'K,'R> -> Frame<'K,string>
Type parameters: 'K, 'R

Creates a data frame from a series containing any .NET objects. The method uses reflection over the specified type parameter 'T and turns its properties to columns.

ofRecords(values)
Signature: values:seq<'T> -> Frame<int,string>
Type parameters: 'T

Creates a data frame from a sequence of any .NET objects. The method uses reflection over the specified type parameter 'T and turns its properties to columns.

ofRecords(values, indexCol)
Signature: (values:IEnumerable * indexCol:string) -> Frame<'R,string>
Type parameters: 'R

Creates a data frame from a sequence of any .NET objects. The method uses reflection over the specified type parameter 'T and turns its properties to columns.

ofRowKeys(keys)
Signature: keys:seq<'R> -> Frame<'R,string>
Type parameters: 'R

Creates a frame with the specified row keys, but no columns (and no data). This is useful if you want to build a frame gradually and restrict all the later added data to a sequence of row keys known in advance.

ofRows(rows)
Signature: (rows:seq<'R * '?497419>) -> Frame<'R,'C>
Type parameters: 'R, '?497419, 'C

Creates a frame from a sequence of row keys and row series pairs. The row series can contain values of any type, but it has to be the same for all the series - if you have heterogenously typed series, use =?>.

ofRows(rows)
Signature: rows:Series<'R,'?497423> -> Frame<'R,'C>
Type parameters: 'R, '?497423, 'C

Creates a frame from a series that maps row keys to a nested series containing values for each row.

ofRowsOrdinal(rows)
Signature: rows:seq<'?497414> -> Frame<int,'K>
Type parameters: '?497414, 'K, 'V

Creates a frame with ordinal Integer index from a sequence of rows. The column indices of individual rows are unioned, so if a row has fewer columns, it will be successfully added, but there will be missing values.

ofValues(values)
Signature: (values:seq<'R * 'C * 'V>) -> Frame<'R,'C>
Type parameters: 'R, 'C, 'V

Create a data frame from a sequence of tuples containing row key, column key and a value.

Frame operations 

The group contains two overloads of the F#-friendly version of the PivotTable method.

Type extensions

Type extensionDescription
PivotTable(r, c, op)
Signature: (r:'TColumnKey * c:'TColumnKey * op:(Frame<'TRowKey,'TColumnKey> -> 'T)) -> Frame<'R,'C>
Type parameters: 'R, 'C, 'T

Creates a new data frame resulting from a 'pivot' operation. Consider a denormalized data frame representing a table: column labels are field names & table values are observations of those fields. pivotTable buckets the rows along two axes, according to the values of the columns r and c; and then computes a value for the frame of rows that land in each bucket.

Parameters

  • r - A column key to group on for the resulting row index
  • c - A column key to group on for the resulting col index
  • op - A function computing a value from the corresponding bucket frame

Input and output 

This group of extensions includes a number of overloads for the ReadCsv and SaveCsv methods. The methods here are designed to be used from F# and so they are F#-style extensions and they use F#-style optional arguments. In general, the overlads take either a path or TextReader/TextWriter. Also note that ReadCsv<'R>(path, indexCol, ...) lets you specify the column to be used as the index.

Type extensions

Type extensionDescription
ReadCsv(...)
Signature: (path:string * indexCol:string * hasHeaders:bool option * inferTypes:bool option * inferRows:int option * schema:string option * separators:string option * culture:string option * maxRows:int option * missingValues:string [] option) -> Frame<'R,string>
Type parameters: 'R

Load data frame from a CSV file. The operation automatically reads column names from the CSV file (if they are present) and infers the type of values for each column. Columns of primitive types (int, float, etc.) are converted to the right type. Columns of other types (such as dates) are not converted automatically.

Parameters

  • path - Specifies a file name or an web location of the resource.
  • indexCol - Specifies the column that should be used as an index in the resulting frame. The type is specified via a type parameter, e.g. use Frame.ReadCsv<int>("file.csv", indexCol="Day").
  • hasHeaders - Specifies whether the input CSV file has header row
  • inferTypes - Specifies whether the method should attempt to infer types of columns automatically (set this to false if you want to specify schema)
  • inferRows - If inferTypes=true, this parameter specifies the number of rows to use for type inference. The default value is 0, meaninig all rows.
  • schema - A string that specifies CSV schema. See the documentation for information about the schema format.
  • separators - A string that specifies one or more (single character) separators that are used to separate columns in the CSV file. Use for example ";" to parse semicolon separated files.
  • culture - Specifies the name of the culture that is used when parsing values in the CSV file (such as "en-US"). The default is invariant culture.
  • maxRows - The maximal number of rows that should be read from the CSV file.
  • missingValues - An array of strings that contains values which should be treated as missing when reading the file. The default value is: "NaN"; "NA"; "#N/A"; ":"; "-"; "TBA"; "TBD".
ReadCsv(...)
Signature: (path:string * hasHeaders:bool option * inferTypes:bool option * inferRows:int option * schema:string option * separators:string option * culture:string option * maxRows:int option * missingValues:string [] option) -> Frame<int,string>

Load data frame from a CSV file. The operation automatically reads column names from the CSV file (if they are present) and infers the type of values for each column. Columns of primitive types (int, float, etc.) are converted to the right type. Columns of other types (such as dates) are not converted automatically.

Parameters

  • path - Specifies a file name or an web location of the resource.
  • hasHeaders - Specifies whether the input CSV file has header row
  • inferTypes - Specifies whether the method should attempt to infer types of columns automatically (set this to false if you want to specify schema)
  • inferRows - If inferTypes=true, this parameter specifies the number of rows to use for type inference. The default value is 100.
  • schema - A string that specifies CSV schema. See the documentation for information about the schema format.
  • separators - A string that specifies one or more (single character) separators that are used to separate columns in the CSV file. Use for example ";" to parse semicolon separated files.
  • culture - Specifies the name of the culture that is used when parsing values in the CSV file (such as "en-US"). The default is invariant culture.
  • maxRows - The maximal number of rows that should be read from the CSV file.
  • missingValues - An array of strings that contains values which should be treated as missing when reading the file. The default value is: "NaN"; "NA"; "#N/A"; ":"; "-"; "TBA"; "TBD".
ReadCsv(...)
Signature: (stream:Stream * hasHeaders:bool option * inferTypes:bool option * inferRows:int option * schema:string option * separators:string option * culture:string option * maxRows:int option * missingValues:string [] option) -> Frame<int,string>

Load data frame from a CSV file. The operation automatically reads column names from the CSV file (if they are present) and infers the type of values for each column. Columns of primitive types (int, float, etc.) are converted to the right type. Columns of other types (such as dates) are not converted automatically.

Parameters

  • stream - Specifies the input stream, opened at the beginning of CSV data
  • hasHeaders - Specifies whether the input CSV file has header row
  • inferTypes - Specifies whether the method should attempt to infer types of columns automatically (set this to false if you want to specify schema)
  • inferRows - If inferTypes=true, this parameter specifies the number of rows to use for type inference. The default value is 100.
  • schema - A string that specifies CSV schema. See the documentation for information about the schema format.
  • separators - A string that specifies one or more (single character) separators that are used to separate columns in the CSV file. Use for example ";" to parse semicolon separated files.
  • culture - Specifies the name of the culture that is used when parsing values in the CSV file (such as "en-US"). The default is invariant culture.
  • maxRows - The maximal number of rows that should be read from the CSV file.
  • missingValues - An array of strings that contains values which should be treated as missing when reading the file. The default value is: "NaN"; "NA"; "#N/A"; ":"; "-"; "TBA"; "TBD".
ReadCsv(...)
Signature: (reader:TextReader * hasHeaders:bool option * inferTypes:bool option * inferRows:int option * schema:string option * separators:string option * culture:string option * maxRows:int option * missingValues:string [] option) -> Frame<int,string>

Load data frame from a CSV file. The operation automatically reads column names from the CSV file (if they are present) and infers the type of values for each column. Columns of primitive types (int, float, etc.) are converted to the right type. Columns of other types (such as dates) are not converted automatically.

Parameters

  • reader - Specifies the TextReader, positioned at the beginning of CSV data
  • hasHeaders - Specifies whether the input CSV file has header row
  • inferTypes - Specifies whether the method should attempt to infer types of columns automatically (set this to false if you want to specify schema)
  • inferRows - If inferTypes=true, this parameter specifies the number of rows to use for type inference. The default value is 100.
  • schema - A string that specifies CSV schema. See the documentation for information about the schema format.
  • separators - A string that specifies one or more (single character) separators that are used to separate columns in the CSV file. Use for example ";" to parse semicolon separated files.
  • culture - Specifies the name of the culture that is used when parsing values in the CSV file (such as "en-US"). The default is invariant culture.
  • maxRows - The maximal number of rows that should be read from the CSV file.
  • missingValues - An array of strings that contains values which should be treated as missing when reading the file. The default value is: "NaN"; "NA"; "#N/A"; ":"; "-"; "TBA"; "TBD".
SaveCsv(...)
Signature: (writer:TextWriter * includeRowKeys:bool option * keyNames:seq<string> option * separator:char option * culture:CultureInfo option) -> unit

Save data frame to a CSV file or a TextWriter. When calling the operation, you can specify whether you want to save the row keys or not (and headers for the keys) and you can also specify the separator (use \t for writing TSV files). When specifying file name ending with .tsv, the \t separator is used automatically.

Parameters

  • writer - Specifies the TextWriter to which the CSV data should be written
  • includeRowKeys - When set to true, the row key is also written to the output file
  • keyNames - Can be used to specify the CSV headers for row key (or keys, for multi-level index)
  • separator - Specify the column separator in the file (the default is \t for TSV files and , for CSV files)
  • culture - Specify the CultureInfo object used for formatting numerical data
SaveCsv(...)
Signature: (path:string * includeRowKeys:bool option * keyNames:seq<string> option * separator:char option * culture:CultureInfo option) -> unit

Save data frame to a CSV file or a TextWriter. When calling the operation, you can specify whether you want to save the row keys or not (and headers for the keys) and you can also specify the separator (use \t for writing TSV files). When specifying file name ending with .tsv, the \t separator is used automatically.

Parameters

  • path - Specifies the output file name where the CSV data should be written
  • includeRowKeys - When set to true, the row key is also written to the output file
  • keyNames - Can be used to specify the CSV headers for row key (or keys, for multi-level index)
  • separator - Specify the column separator in the file (the default is \t for TSV files and , for CSV files)
  • culture - Specify the CultureInfo object used for formatting numerical data
SaveCsv(path, keyNames)
Signature: (path:string * keyNames:seq<string>) -> unit

Save data frame to a CSV file or to a TextWriter. When calling the operation, you can specify whether you want to save the row keys or not (and headers for the keys) and you can also specify the separator (use \t for writing TSV files). When specifying file name ending with .tsv, the \t separator is used automatically.

Parameters

  • path - Specifies the output file name where the CSV data should be written
  • keyNames - Specifies the CSV headers for row key (or keys, for multi-level index)
  • separator - Specify the column separator in the file (the default is \t for TSV files and , for CSV files)
  • culture - Specify the CultureInfo object used for formatting numerical data
ToDataTable(rowKeyNames)
Signature: rowKeyNames:seq<string> -> DataTable

Returns the data of the frame as a .NET DataTable object. The column keys are automatically converted to strings that are used as column names. The row index is turned into an additional column with the specified name (the function takes the name as a sequence to support hierarchical keys, but typically you can write just frame.ToDataTable(["KeyName"]).

Parameters

  • rowKeyNames - Specifies the names of the row key components (or just a single row key name if the row index is not hierarchical).
Fork me on GitHub