Deedle


Working with data frames in F#

In this section, we look at various features of the F# data frame library (using both Series and Frame types and modules). Feel free to jump to the section you are interested in, but note that some sections refer back to values built in "Creating & loading".

You can also get this page as an F# script file from GitHub and run the samples interactively.

Creating frames & loading data

Loading and saving CSV files

The easiest way to get data into data frame is to use a CSV file. The Frame.ReadCsv function exposes this functionality:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
// Assuming 'root' is a directory containing the file
let titanic = Frame.ReadCsv(root + "Titanic.csv")

// Read data and set the index column & order rows
let msft = 
  Frame.ReadCsv(root + "stocks/msft.csv") 
  |> Frame.indexRowsDate "Date"
  |> Frame.sortRowsByKey

// Specify column separator
let air = Frame.ReadCsv(root + "AirQuality.csv", separators=";")

In the second example, we call indexRowsDate to use the "Date" column as a row index of the resulting data frame. This is a very common scenario and so Deedle provides an easier option using a generic overload of the ReadCsv method:

1: 
2: 
3: 
let msftSimpler = 
  Frame.ReadCsv<DateTime>(root + "stocks/msft.csv", indexCol="Date") 
  |> Frame.sortRowsByKey

The ReadCsv method has a number of optional arguments that you can use to control the loading. It supports both CSV files, TSV files and other formats. If the file name ends with tsv, the Tab is used automatically, but you can set separator explicitly. The following parameters can be used:

  • path - Specifies a file name or an web location of the resource.
  • indexCol - Specifies the column that should be used as an index in the resulting frame. The type is specified via a type parameter.
  • inferTypes - Specifies whether the method should attempt to infer types of columns automatically (set this to false if you want to specify schema)
  • inferRows - If inferTypes=true, this parameter specifies the number of rows to use for type inference. The default value is 0, meaninig all rows.
  • schema - A string that specifies CSV schema. See the documentation for information about the schema format.
  • separators - A string that specifies one or more (single character) separators that are used to separate columns in the CSV file. Use for example ";" to parse semicolon separated files.
  • culture - Specifies the name of the culture that is used when parsing values in the CSV file (such as "en-US"). The default is invariant culture.

The parameters are the same as those used by the CSV type provider in F# Data, so you can find additional documentation there.

Once you have a data frame, you can also save it to a CSV file using the SaveCsv method. For example:

1: 
2: 
3: 
4: 
// Save CSV with semicolon separator
air.SaveCsv(Path.GetTempFileName(), separator=';')
// Save as CSV and include row key as "Date" column
msft.SaveCsv(Path.GetTempFileName(), keyNames=["Date"], separator='\t')

By default, the SaveCsv method does not include the key from the data frame. This can be overriden by calling SaveCsv with the optional argument includeRowKeys=true, or with an additional argument keyNames (demonstrated above) which sets the headers for the key columns(s) in the CSV file. Usually, there is just a single row key, but there may be multiple when hierarchical indexing is used.

Loading F# records or .NET objects

If you have another .NET or F# components that returns data as a sequence of F# records, C# anonymous types or other .NET objects, you can use Frame.ofRecords to turn them into a data frame. Assume we have:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
type Person = 
  { Name:string; Age:int; Countries:string list; }

let peopleRecds = 
  [ { Name = "Joe"; Age = 51; Countries = [ "UK"; "US"; "UK"] }
    { Name = "Tomas"; Age = 28; Countries = [ "CZ"; "UK"; "US"; "CZ" ] }
    { Name = "Eve"; Age = 2; Countries = [ "FR" ] }
    { Name = "Suzanne"; Age = 15; Countries = [ "US" ] } ]

Now we can easily create a data frame that contains three columns (Name, Age and Countries) containing data of the same type as the properties of Person:

1: 
2: 
3: 
4: 
// Turn the list of records into data frame 
let peopleList = Frame.ofRecords peopleRecds
// Use the 'Name' column as a key (of type string)
let people = peopleList |> Frame.indexRowsString "Name"

Note that this does not perform any conversion on the column data. Numerical series can be accessed using the ? operator. For other types, we need to explicitly call GetColumn with the right type arguments:

1: 
2: 
people?Age
people.GetColumn<string list>("Countries")

F# Data providers

In general, you can use any data source that exposes data as series of tuples. This means that we can easily load data using, for example, the World Bank type provider from F# Data library.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
// Connect to the World Bank
let wb = WorldBankData.GetDataContext()

/// Given a region, load GDP in current US$ and return data as 
/// a frame with two-level column key (region and country name)
let loadRegion (region:WorldBankData.ServiceTypes.Region) =
  [ for country in region.Countries -> 
      // Create two-level column key using tuple
      (region.Name, country.Name) => 
        // Create series from tuples returned by WorldBank
        Series.ofObservations country.Indicators.``GDP (current US$)`` ]
  |> frame

To make data manipulation more convenient, we read country information per region and create data frame with a hierarchical index (for more information, see the advanced indexing section). Now we can easily read data for OECD and Euro area:

1: 
2: 
3: 
4: 
5: 
6: 
// Load Euro and OECD regions
let eu = loadRegion wb.Regions.``Euro area``
let oecd = loadRegion wb.Regions.``OECD members``

// Join and convert to in billions of USD
let world = eu.Join(oecd) / 1e9

(Euro area, Austria)

(Euro area, Belgium)

(Euro area, Cyprus)

...

(OECD members, Sweden)

(OECD members, Turkey)

(OECD members, United States)

1960

6.59

11.66

N/A

...

14.84

13.98

543.3

1961

7.31

12.4

N/A

...

16.15

8.02

563.3

1962

7.76

13.26

N/A

...

17.51

8.92

605.1

1963

8.37

14.26

N/A

...

18.95

10.36

638.6

1964

9.17

15.96

N/A

...

21.14

11.18

685.8

1965

9.99

17.37

N/A

...

23.26

11.94

743.7

1966

10.89

18.65

N/A

...

25.3

14.12

815

1967

11.58

19.99

N/A

...

27.46

15.67

861.7

...

...

...

...

...

...

...

...

2011

429.07

528.1

27.09

...

563.11

774.75

15517.93

2012

407.58

498.75

24.94

...

543.88

788.86

16163.16

2013

428.32

524.78

24.06

...

579.53

823.24

16768.05

2014

436.34

533.38

23.23

...

570.59

799.53

17419

The loaded data look something like the sample above. As you can see, the columns are grouped by the region and some data are not available.

Expanding objects in columns

It is possible to create data frames that contain other .NET objects as members in a series. This might be useful, for example, when you get multiple data sources producing objects and you want to align or join them before working with them. However, working with frames that contain complex .NET objects is less conveninet.

For this reason, the data frame supports expansion. Given a data frame with some object in a column, you can use Frame.expandCols to create a new frame that contains properties of the object as new columns. For example:

1: 
2: 
3: 
4: 
5: 
6: 
// Create frame with single column 'People'
let peopleNested = 
  [ "People" => Series.ofValues peopleRecds ] |> frame

// Expand the 'People' column
peopleNested |> Frame.expandCols ["People"]

People.Name

People.Age

People.Countries

0

Joe

51

[UK; US; UK]

1

Tomas

28

[CZ; UK; US; ... ]

2

Eve

2

[FR]

3

Suzanne

15

[US]

As you can see, the operation generates columns based on the properties of the original column type and generates new names by prefixing the property names with the name of the original column.

Aside from properties of .NET objects, the expansion can also handle values of type IDictionary<K, V> and series that contain nested series with string keys (i.e. Series<string, T>). If you have more complex structure, you can use Frame.expandAllCols to expand columns to a specified level recursively:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Series that contains dictionaries, containing tuples
let tuples = 
  [ dict ["A", box 1; "C", box (2, 3)]
    dict ["B", box 1; "C", box (3, 4)] ] 
  |> Series.ofValues

// Expand dictionary keys (level 1) and tuple items (level 2)
frame ["Tuples" => tuples]
|> Frame.expandAllCols 2

Here, the resulting data frame will have 4 columns including Tuples.A and Tuples.B (for the first keys) and Tuples.C.Item1 together with Tuples.C.Item2 representing the two items of the tuple nested in a dictionary.

Manipulating data frames

The series type Series<K, V> represents a series with keys of type K and values of type V. This means that when working with series, the type of values is known statically. When working with data frames, this is not the case - a frame is represented as Frame<R, C> where R and C are the types of row and column indices, respectively (typically, R will be an int or DateTime and C will be string representing different column/series names.

A frame can contain heterogeneous data. One column may contain integers, another may contain floating point values and yet another can contain strings, dates or other objects like lists of strings. This information is not captured statically - and so when working with frames, you may need to specify the type explicitly, for example, when reading a series from a frame.

Getting data from a frame

We'll use the data frame people which contains three columns - Name of type string, Age of type int and Countries of type string list (we created it from F# records in the previous section):

Age

Countries

AgePlusOne

Siblings

Joe

51

[UK; US; UK]

52

3

Tomas

28

[CZ; UK; US; ... ]

29

2

Eve

2

[FR]

3

1

Suzanne

15

[US]

16

0

To get a column (series) from a frame df, you can use operations that are exposed directly by the data frame, or you can use df.Columns which returns all columns of the frame as a series of series.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
// Get the 'Age' column as a series of 'float' values
// (the '?' operator converts values automatically)
people?Age
// Get the 'Countries' column as a series of 'string list' values
people.GetColumn<string list>("Countries")
// Get all frame columns as a series of series
people.Columns

A series s of type Series<string, V> supports the question mark operator s?Foo to get a value of type V associated with the key Foo. For other key types, you can sue the Get method. Note that, unlike with frames, there is no implicit conversion:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Get Series<string, float> 
let numAges = people?Age

// Get value using question mark
numAges?Tomas
// Get value using 'Get' method
numAges.Get("Tomas")
// Returns missing when key is not found
numAges.TryGet("Fridrich")

The question mark operator and Get method can be used on the Columns property of data frame. The return type of df?Columns is ColumnSeries<string, string> which is just a thin wrapper over Series<C, ObjectSeries<R>>. This means that you get back a series indexed by column names where the values are ObjectSeries<R> representing individual columns. The type ObjectSeries<R> is a thin wrapper over Series<R, obj> which adds several functions for getting the values as values of specified type.

In our case, the returned values are individual columns represented as ObjectSeries<string>:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
// Get column as an object series
people.Columns?Age
people.Columns?Countries
val it : ObjectSeries<string> =
  Joe     -> [UK; US; UK]       
  Tomas   -> [CZ; UK; US; ... ] 
  Eve     -> [FR]               
  Suzanne -> [US]

// Get column & try get column using members
people.Columns.Get("Countries")
people.Columns.TryGet("CreditCard")
// Get column at a specified offset
people.Columns.GetAt(0)

// Get column as object series and convert it
// to a typed Series<string, string>
people.Columns?Age.As<int>()
// Try converting column to Series<string, string>
people.Columns?Age.TryAs<string>()

The type ObjectSeries<string> has a few methods in addition to ordinary Series<K, V> type. On the lines 18 and 20, we use As<T> and TryAs<T> that can be used to convert object series to a series with statically known type of values. The expression on line 18 is equivalent to people.GetColumn<obj>("Age"), but it is not specific to frame columns - you can use the same approach to work with frame rows (using people.Rows) if your data set has rows of homogeneous types.

Another case where you'll need to work with ObjectSeries<T> is when mapping over rows:

1: 
2: 
3: 
// Iterate over rows and get the length of country list
people.Rows |> Series.mapValues (fun row ->
  row.GetAs<string list>("Countries").Length)

The rows that you get as a result of people.Rows are heterogeneous (they contain values of different types), so we cannot use row.As<T>() to convert all values of the series to some type. Instead, we use GetAs<T>(...) which is similar to Get(...) but converts the value to a given type. You could also achieve the same thing by writing row?Countries and then casting the result to string list, but the GetAs method provides a more convenient syntax.

Typed access to rows

Accessing columns using ObjectSeries<T> is fine for simple tasks, but it has two problems. First, it is not type-safe and you can easily get a runtime exception if you specify wrong type. Second, it involves boxing and unboxing and so it may be inefficient.

To address these two issues, Deedle provides another alternative. You can specify an interface that defines the types of columns once and then use this interface to get a series of rows where every row is an instance of the interface:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
/// Expected columns & their types in a row
type IPerson = 
  abstract Age : int
  abstract Countries : string list

// Get rows as series of 'IPerson' values
let rows = people.GetRowsAs<IPerson>()
rows.["Tomas"].Countries 

You still need to be careful and define the types in the IPerson interface correctly, but once the GetRowsAs<IPerson> call returns a value, you will be able to access the rows in a nice typed way. Alternatively, you can also specify the type with OptionalValue<T>, in case you want to explicitly handle missing values.

1: 
2: 
3: 
4: 
/// Alternative that lets us handle missing 'Age' values
type IPersonOpt = 
  abstract Age : OptionalValue<int>
  abstract Countries : string list

Adding rows and columns

The series type is immutable and so it is not possible to add new values to a series or change the values stored in an existing series. However, you can use operations that return a new series as the result such as Merge.

1: 
2: 
3: 
4: 
// Create series with more value
let more = series [ "John" => 48.0 ]
// Create a new, concatenated series
people?Age.Merge(more)

Data frame allows a very limited form of mutation. It is possible to add new series (as a column) to an existing data frame, drop a series or replace a series. However, individual series are still immutable.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
// Calculate age + 1 for all people
let add1 = people?Age |> Series.mapValues ((+) 1.0)

// Add as a new series to the frame
people?AgePlusOne <- add1

// Add new series from a list of values
people?Siblings <- [0; 2; 1; 3]

// Replace existing series with new values
// (Equivalent to people?Siblings <- ...)
people.ReplaceColumn("Siblings", [3; 2; 1; 0])

Finally, it is also possible to append one data frame or a single row to an existing data frame. The operation is immutable, so the result is a new data frame with the added rows. To create a new row for the data frame, we can use standard ways of constructing series from key-value pairs, or we can use the SeriesBuilder type:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
// Create new object series with values for required columns
let newRow = 
  [ "Name" => box "Jim"; "Age" => box 51;
    "Countries" => box ["US"]; "Siblings" => box 5 ]
  |> series
// Create a new data frame, containing the new series
people.Merge("Jim", newRow)

// Another option is to use mutable SeriesBuilder
let otherRow = SeriesBuilder<string>()
otherRow?Name <- "Jim"
otherRow?Age <- 51
otherRow?Countries <- ["US"]
otherRow?Siblings <- 5
// The Series property returns the built series
people.Merge("Jim", otherRow.Series)

Advanced slicing and lookup

Given a series, we have a number of options for getting one or more values or observations (keys and an associated values) from the series. First, let's look at different lookup operations that are available on any (even unordered series).

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Sample series with different keys & values
let nums = series [ 1 => 10.0; 2 => 20.0 ]
let strs = series [ "en" => "Hi"; "cz" => "Ahoj" ]

// Lookup values using keys
nums.[1]
strs.["en"]
// Supported when key is string
strs?en      

For more examples, we use the Age column from earlier data set as example:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
// Get an unordered sample series 
let ages = people?Age

// Returns value for a given key
ages.["Tomas"]
// Returns series with two keys from the source
ages.[ ["Tomas"; "Joe"] ]

The Series module provides another set of useful functions (many of those are also available as members, for example via ages.TryGet):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
// Fails when key is not present
try ages |> Series.get "John" with _ -> nan
// Returns 'None' when key is not present
ages |> Series.tryGet "John"
// Returns series with missing value for 'John'
// (equivalent to 'ages.[ ["Tomas"; "John"] ]')
ages |> Series.getAll [ "Tomas"; "John" ]

We can also obtain all data from the series. The data frame library uses the term observations for all key-value pairs

1: 
2: 
3: 
4: 
5: 
6: 
// Get all observations as a sequence of 'KeyValuePair'
ages.Observations
// Get all observations as a sequence of tuples
ages |> Series.observations
// Get all observations, with 'None' for missing values
ages |> Series.observationsAll

The previous examples were always looking for an exact key. If we have an ordered series, we can search for a nearest available key and we can also perform slicing. We use MSFT stock prices from earlier example:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Get series with opening prices
let opens = msft?Open

// Fails. The key is not available in the series
try opens.[DateTime(2013, 1, 1)] with e -> nan
// Works. Find value for the nearest greater key
opens.Get(DateTime(2013, 1, 1), Lookup.ExactOrSmaller)
// Works. Find value for the nearest smaler key
opens.Get(DateTime(2013, 1, 1), Lookup.ExactOrSmaller)

When using instance members, we can use Get which has an overload taking Lookup. The same functionality is exposed using Series.lookup. We can also obtain values for a sequence of keys:

1: 
2: 
3: 
4: 
5: 
6: 
// Find value for the nearest greater key
opens |> Series.lookup (DateTime(2013, 1, 1)) Lookup.ExactOrGreater

// Get first price for each month in 2012
let dates = [ for m in 1 .. 12 -> DateTime(2012, m, 1) ]
opens |> Series.lookupAll dates Lookup.ExactOrGreater

With ordered series, we can use slicing to get a sub-range of a series:

1: 
2: 
opens.[DateTime(2013, 1, 1) .. DateTime(2013, 1, 31)]
|> Series.mapKeys (fun k -> k.ToShortDateString())

Keys

1/2/2013

1/3/2013

1/4/2013

1/7/2013

1/8/2013

...

1/29/2013

1/30/2013

1/31/2013

Values

27.25

27.63

27.27

26.77

26.75

...

27.82

28.01

27.79

The slicing works even if the keys are not available in the series. The lookup automatically uses nearest greater lower bound and nearest smaller upper bound (here, we have no value for January 1).

Several other options - discussed in a later section - are available when using hierarchical (or multi-level) indices. But first, we need to look at grouping.

Grouping data

Grouping of data can be performed on both unordered and ordered series and frames. For ordered series, more options (such as floating window or grouping of consecutive elements) are available - these can be found in the time series tutorial. There are essentially two options:

  • You can group series of any values and get a series of series (representing individual groups). The result can easily be turned into a data frame using Frame.ofColumns or Frame.ofRows, but this is not done automatically.

  • You can group a frame rows using values in a specified column, or using a function. The result is a frame with multi-level (hierarchical) index. Hierarchical indexing is discussed later.

Keep in mind that you can easily get a series of rows or a series of columns from a frame using df.Rows and df.Columns, so the first option is also useful on data frames.

Grouping series

In the following sample, we use the data frame people loaded from F# records in an earlier section. Let's first get the data:

1: 
2: 
3: 
4: 
5: 
6: 
let travels = people.GetColumn<string list>("Countries")
val travels : Series<string,string list> =
  Joe     -> [UK; US; UK]       
  Tomas   -> [CZ; UK; US; ... ] 
  Eve     -> [FR] 
  Suzanne -> [US]

Now we can group the elements using both key (e.g. length of a name) and using the value (e.g. the number of visited countries):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Group by name length (ignoring visited countries)
travels |> Series.groupBy (fun k v -> k.Length)
// Group by visited countries (people visited/not visited US)
travels |> Series.groupBy (fun k v -> List.exists ((=) "US") v)

// Group by name length and get number of values in each group
travels |> Series.groupInto 
  (fun k v -> k.Length) 
  (fun len people -> Series.countKeys people)

The groupBy function returns a series of series (series with new keys, containing series with all values for a given new key). You can than transform the values using Series.mapValues. However, if you want to avoid allocating all intermediate series, you can also use Series.groupInto which takes projection function as a second argument. In the above examples, we count the number of keys in each group.

As a final example, let's say that we want to build a data frame that contains individual people (as rows), all countries that appear in someone's travel list (as columns). The frame contains the number of visits to each country by each person:

1: 
2: 
3: 
4: 
travels
|> Series.mapValues (Seq.countBy id >> series)
|> Frame.ofRows
|> Frame.fillMissingWith 0

UK

US

CZ

FR

Joe

2

1

0

0

Tomas

1

1

2

0

Eve

0

0

0

1

Suzanne

0

1

0

0

The problem can be solved just using Series.mapValues, together with standard F# Seq functions. We iterate over all rows (people and their countries). For each country list, we generate a series that contains individual countries and the count of visits (this is done by composing Seq.countBy and a function series to build a series of observations). Then we turn the result to a data frame and fill missing values with the constant zero (see a section about handling missing values).

Grouping data frames

So far, we worked with series and series of series (which can be turned into data frames using Frame.ofRows and Frame.ofColumns). Next, we look at working with data frames.

Assume we loaded Titanic data set that is also used on the project home page. First, let's look at basic grouping (also used in the home page demo):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
// Group using column 'Sex' of type 'string'
titanic |> Frame.groupRowsByString "Sex"

// Grouping using column converted to 'decimal'
let byDecimal : Frame<decimal * _, _> = 
  titanic |> Frame.groupRowsBy "Fare"

// This is easier using member syntax
titanic.GroupRowsBy<decimal>("Fare")

// Group using calculated value - length of name
titanic |> Frame.groupRowsUsing (fun k row -> 
  row.GetAs<string>("Name").Length)

When working with frames, you can group data using both rows and columns. For most functions there is groupRows and groupCols equivalent. The easiest functions to use are Frame.groupRowsByXyz where Xyz specifies the type of the column that we're using for grouping. For example, we can easily group rows using the "Sex" column.

When using less common type, you need to specify the type of the column. You can see this on lines 5 and 9 where we use decimal as the key. Finally, you can also specify key selector as a function. The function gets the original key and the row as a value of ObjectSeries<K>. The type has various members for getting individual values (columns) such as GetAs which allows us to get a column of a specified type.

Grouping by single key

A grouped data frame uses multi-level index. This means that the index is a tuple of keys that represent multiple levels. For example:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
titanic |> Frame.groupRowsByString "Sex"
val it : Frame<(string * int),string> =
                Survive   Name                    
  female 2   -> True      Heikkinen, Miss. Laina  
         11  -> True      Bonnell, Miss. Elizabeth
         19  -> True      Masselmani, Mrs. Fatima 
                ...       ...                     
  male   870 -> False     Balkic, Mr. Cerin       
         878 -> False     Laleff, Mr. Kristo      

As you can see, the pretty printer understands multi-level indices and outputs the first level (sex) followed by the second level (passanger id). You can turn frame with two-level index into a series of data frames (and vice versa) using Frame.unnest and Frame.nest:

1: 
2: 
3: 
4: 
5: 
let bySex = titanic |> Frame.groupRowsByString "Sex" 
// Returns series with two frames as values
let bySex1 = bySex |> Frame.nest
// Converts unstacked data back to a single frame
let bySex2 = bySex |> Frame.nest |> Frame.unnest

Grouping by multiple keys

Finally, we can also apply grouping operation repeatedly to group data using multiple keys (and get a frame indexed by more than 2 levels). For example, we can group passangers by their class and port where they embarked:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
// Group by passanger class and port
let byClassAndPort = 
  titanic
  |> Frame.groupRowsByInt "Pclass"
  |> Frame.groupRowsByString "Embarked"
  |> Frame.mapRowKeys Pair.flatten3

// Get just the Age series with the same row index
let ageByClassAndPort = byClassAndPort?Age

If you look at the type of byClassAndPort, you can see that it is Frame<(string * int * int),string>. The row key is a tripple consisting of port identifier (string), passanger class (int between 1 and 3) and the passanger id. The multi-level indexing is preserved when we get a single series from the frame.

As our last example, we look at various ways of aggregating the groups:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
// Get average ages in each group
byClassAndPort?Age
|> Stats.levelMean Pair.get1And2Of3

// Averages for all numeric columns
byClassAndPort
|> Frame.getNumericCols
|> Series.dropMissing
|> Series.mapValues (Stats.levelMean Pair.get1And2Of3)
|> Frame.ofColumns

// Count number of survivors in each group
byClassAndPort.GetColumn<bool>("Survived")
|> Series.applyLevel Pair.get1And2Of3 (Series.values >> Seq.countBy id >> series)
|> Frame.ofRows

The second snippet combines a number of useful functions. It uses Frame.getNumericColumns to obtain just numerical columns from a data frame. Then it drops the non-numerical columns using Series.dropMissing. Then we use Series.mapValues to apply the averaging operation to all columns.

The last snippet is alo interesting. We get the "Survived" column (which contains Boolean values) and we aggregate each group using a specified function. The function is composed from three components - it first gets the values in the group, counts them (to get a number of true and false values) and then creates a series with the results. The result looks as the following table (some values were omitted):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
         True  False     
C 1  ->  59    26        
  2  ->  9     8         
  3  ->  25    41        
S 1  ->  74    53        
  2  ->  76    88        
  3  ->  67    286      

Summarizing data with pivot table

In the previous section, we looked at grouping, which is a very general data manipulation operation. However, very often we want to perform two operations at the same time - group the data by certain keys and produce an aggregate. This combination is captured by the concept of a pivot table.

A pivot table is a useful tool if you want to summarize data in the frame based on two keys that are available in the rows of the data frame.

For example, given the titanic data set that we loaded earlier and explored in the previous section, we might want to compare the survival rate for males and females. The pivot table makes this possible using just a single call:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
titanic 
|> Frame.pivotTable 
    // Returns a new row key
    (fun k r -> r.GetAs<string>("Sex")) 
    // Returns a new column key
    (fun k r -> r.GetAs<bool>("Survived")) 
    // Specifies aggregation for sub-frames
    Frame.countRows 

The pivotTable function (and the corresponding PivotTable method) take three arguments. The first two specify functions that, given a row in the original frame, return a new row key and column key, respectively. In the above example, the new row key is the Sex value and the new column key is whether a person survived or not. As a result we get the following two by two table:

False

True

male

468

109

female

81

233

Note, we could also use the PivotTable member method along with a type annotation on the result for readability:

1: 
2: 
let table : Frame<string,bool> = 
  titanic.PivotTable("Sex", "Survived", Frame.countRows)

The pivot table operation takes the source frame, partitions the data (rows) based on the new row and column keys and then aggregates each frame using the specified aggregation. In the above example, we used Frame.countRows to simply return number of people in each sub-group. However, we could easily calculate other statistic - such as average age:

1: 
2: 
3: 
4: 
5: 
6: 
titanic 
|> Frame.pivotTable 
    (fun k r -> r.GetAs<string>("Sex")) 
    (fun k r -> r.GetAs<bool>("Survived")) 
    (fun frame -> frame?Age |> Stats.mean)
|> round

The results suggest that older males were less likely survive than younger males, but older females were more likely to survive then younger females:

False

True

male

32

27

female

25

29

Hierarchical indexing

For some data sets, the index is not a simple sequence of keys, but instead a more complex hierarchy. This can be captured using hierarchical indices. They also provide a convenient way of dealing with multi-dimensional data. The most common source of multi-level indices is grouping (the previous section has a number of examples).

Lookup in the World Bank data set

In this section, we start by looking at the World Bank data set from earlier. It is a data frame with two-level hierarchy of columns, where the first level is the name of region and the second level is the name of country.

Basic lookup can be performed using slicing operators. The following are only available in F# 3.1:

1: 
2: 
3: 
4: 
5: 
6: 
// Get all countries in Euro area
world.Columns.["Euro area", *]
// Get Belgium data from Euro area group
world.Columns.[("Euro area", "Belgium")]
// Belgium is returned twice - from both Euro and OECD
world.Columns.[*, "Belgium"]

In F# 3.0, you can use a family of helper functions LookupXOfY as follows:

1: 
2: 
3: 
4: 
// Get all countries in Euro area
world.Columns.[Lookup1Of2 "Euro area"]
// Belgium is returned twice - from both Euro and OECD
world.Columns.[Lookup2Of2 "Belgium"]

The lookup operations always return data frame of the same type as the original frame. This means that even if you select one sub-group, you get back a frame with the same multi-level hierarchy of keys. This can be easily changed using projection on keys:

1: 
2: 
3: 
4: 
// Drop the first level of keys (and get just countries)
let euro = 
  world.Columns.["Euro area", *]
  |> Frame.mapColKeys snd

Grouping and aggregating World Bank data

Hierarchical keys are often created as a result of grouping. For example, we can group the rows (representing individual years) in the Euro zone data set by decades (for more information about grouping see also grouping section in this document).

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
let decades = euro |> Frame.groupRowsUsing (fun k _ -> 
  sprintf "%d0s" (k / 10))
 
val decades : Frame<(string * int),string> =
                Austria  Estonia   ...      
  1960s 1960 -> 6.592    <missing> 
        1961 -> 7.311    <missing> 
        ...  
  2010s 2010 -> 376.8    18.84 
        2011 -> 417.6    22.15 
        2012 -> 399.6    21.85 

Now that we have a data frame with hierarchical index, we can select data in a single group, such as 1990s. The result is a data frame of the same type. We can also multiply the values, to get original GDP in USD (rather than billions):

1: 
decades.Rows.["1990s", *] * 1e9

The Frame and Series modules provide a number of functions for aggregating the groups. We can access a specific country and aggregate GDP for a country, or we can apply aggregation to the entire data set:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
// Calculate means per decades for Slovakia
decades?``Slovak Republic`` |> Stats.levelMean fst

// Calculate means per decateds for all countries
decades
|> Frame.getNumericCols
|> Series.mapValues (Stats.levelMean fst)
|> Frame.ofColumns

// Calculate standard deviation per decades in USD
decades?Belgium * 1.0e9 
|> Stats.levelStdDev fst

So far, we were working with data frames that only had one hierarchical index. However, it is perfectly possible to have hierarchical index for both rows and columns. The following snippet groups countries by their average GDP (in addition to grouping rows by decades):

1: 
2: 
3: 
4: 
// Group countries by comparing average GDP with $500bn
let byGDP = 
  decades |> Frame.transpose |> Frame.groupRowsUsing (fun k v -> 
    v.As<float>() |> Stats.mean > 500.0)

You can see (by hovering over byGDP) that the two hierarchies are captured in the type. The column key is bool * string (rich? and name) and the row key is string * int (decade, year). This creates two groups of columns. One containing France, Germany and Italy and the other containing remaining countries.

The aggregations are only (directly) supported on rows, but we can use Frame.transpose to switch between rows and columns.

Handling missing values

THe support for missing values is built-in, which means that any series or frame can contain missing values. When constructing series or frames from data, certain values are automatically treated as "missing values". This includes Double.NaN, null values for reference types and for nullable types:

1: 
Series.ofValues [ Double.NaN; 1.0; 3.14 ]

Keys

0

1

2

Values

N/A

1

3.14

1: 
2: 
[ Nullable(1); Nullable(); Nullable(3) ]
|> Series.ofValues

Keys

0

1

2

Values

1

N/A

3

Missing values are automatically skipped when performing statistical computations such as Series.mean. They are also ignored by projections and filtering, including Series.mapValues. When you want to handle missing values, you can use Series.mapAll that gets the value as option<T> (we use sample data set from earlier section):

1: 
2: 
3: 
4: 
5: 
6: 
// Get column with missing values
let ozone = air?Ozone 

// Replace missing values with zeros
ozone |> Series.mapAll (fun k v -> 
  match v with None -> Some 0.0 | v -> v)

In practice, you will not need to use Series.mapAll very often, because the series module provides functions that fill missing values more easily:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
// Fill missing values with constant
ozone |> Series.fillMissingWith 0.0

// Available values are copied in backward 
// direction to fill missing values
ozone |> Series.fillMissing Direction.Backward

// Available values are propagated forward
// (if the first value is missing, it is not filled!)
ozone |> Series.fillMissing Direction.Forward

// Fill values and drop those that could not be filled
ozone |> Series.fillMissing Direction.Forward
      |> Series.dropMissing

Various other strategies for handling missing values are not currently directly supported by the library, but can be easily added using Series.fillMissingUsing. It takes a function and calls it on all missing values. If we have an interpolation function, then we can pass it to fillMissingUsing and perform any interpolation needed.

For example, the following snippet gets the previous and next values and averages them (if they are available) or returns one of them (or zero if there are no values at all):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
// Fill missing values using interpolation function
ozone |> Series.fillMissingUsing (fun k -> 
  // Get previous and next values
  let prev = ozone.TryGet(k, Lookup.ExactOrSmaller)
  let next = ozone.TryGet(k, Lookup.ExactOrGreater)
  // Pattern match to check which values were available
  match prev, next with 
  | OptionalValue.Present(p), OptionalValue.Present(n) -> 
      (p + n) / 2.0
  | OptionalValue.Present(v), _ 
  | _, OptionalValue.Present(v) -> v
  | _ -> 0.0)
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
namespace FSharp
namespace FSharp.Data
type WorldBankData =
  static member GetDataContext : unit -> WorldBankDataService
  nested type ServiceTypes

Full name: FSharp.Data.WorldBankData


<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>
FSharp.Data.WorldBankData.GetDataContext() : FSharp.Data.WorldBankData.ServiceTypes.WorldBankDataService
namespace System
namespace System.IO
namespace Deedle
val root : string

Full name: Frame.root
val titanic : Frame<int,string>

Full name: Frame.titanic
Multiple items
module Frame

from Deedle

--------------------
type Frame =
  static member CreateEmpty : unit -> Frame<'R,'C> (requires equality and equality)
  static member FromArray2D : array:'T [,] -> Frame<int,int>
  static member FromColumns : cols:Series<'TColKey,Series<'TRowKey,'V>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : cols:Series<'TColKey,ObjectSeries<'TRowKey>> -> Frame<'TRowKey,'TColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,ObjectSeries<'RowKey>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : columns:seq<KeyValuePair<'ColKey,Series<'RowKey,'V>>> -> Frame<'RowKey,'ColKey> (requires equality and equality)
  static member FromColumns : cols:seq<Series<'ColKey,'V>> -> Frame<'ColKey,int> (requires equality)
  static member FromRecords : values:seq<'T> -> Frame<int,string>
  static member FromRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
  static member FromRowKeys : keys:seq<'K> -> Frame<'K,string> (requires equality)
  ...

Full name: Deedle.Frame

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface INotifyCollectionChanged
  interface IFsiFormattable
  interface IFrame
  new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
  new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> -> unit
  member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
  member AddColumn : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
  ...

Full name: Deedle.Frame<_,_>

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:Indices.IIndex<'TRowKey> * columnIndex:Indices.IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:Indices.IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
static member Frame.ReadCsv : path:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int * ?missingValues:string [] -> Frame<int,string>
static member Frame.ReadCsv : stream:Stream * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int * ?missingValues:string [] -> Frame<int,string>
static member Frame.ReadCsv : reader:TextReader * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int * ?missingValues:string [] -> Frame<int,string>
static member Frame.ReadCsv : path:string * indexCol:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int * ?missingValues:string [] -> Frame<'R,string> (requires equality)
val msft : Frame<DateTime,string>

Full name: Frame.msft
val indexRowsDate : column:'C -> frame:Frame<'R1,'C> -> Frame<DateTime,'C> (requires equality and equality)

Full name: Deedle.Frame.indexRowsDate
val sortRowsByKey : frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)

Full name: Deedle.Frame.sortRowsByKey
val air : Frame<int,string>

Full name: Frame.air
val msftSimpler : Frame<DateTime,string>

Full name: Frame.msftSimpler
Multiple items
type DateTime =
  struct
    new : ticks:int64 -> DateTime + 10 overloads
    member Add : value:TimeSpan -> DateTime
    member AddDays : value:float -> DateTime
    member AddHours : value:float -> DateTime
    member AddMilliseconds : value:float -> DateTime
    member AddMinutes : value:float -> DateTime
    member AddMonths : months:int -> DateTime
    member AddSeconds : value:float -> DateTime
    member AddTicks : value:int64 -> DateTime
    member AddYears : value:int -> DateTime
    ...
  end

Full name: System.DateTime

--------------------
DateTime()
   (+0 other overloads)
DateTime(ticks: int64) : unit
   (+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
   (+0 other overloads)
member Frame.SaveCsv : path:string * keyNames:seq<string> -> unit
static member FrameExtensions.SaveCsv : frame:Frame<'R,'C> * path:string * keyNames:seq<string> * separator:char * culture:Globalization.CultureInfo -> unit (requires equality and equality)
static member FrameExtensions.SaveCsv : frame:Frame<'R,'C> * writer:TextWriter * includeRowKeys:bool * keyNames:seq<string> * separator:char * culture:Globalization.CultureInfo -> unit (requires equality and equality)
static member FrameExtensions.SaveCsv : frame:Frame<'R,'C> * path:string * includeRowKeys:bool * keyNames:seq<string> * separator:char * culture:Globalization.CultureInfo -> unit (requires equality and equality)
member Frame.SaveCsv : writer:TextWriter * ?includeRowKeys:bool * ?keyNames:seq<string> * ?separator:char * ?culture:Globalization.CultureInfo -> unit
member Frame.SaveCsv : path:string * ?includeRowKeys:bool * ?keyNames:seq<string> * ?separator:char * ?culture:Globalization.CultureInfo -> unit
type Path =
  static val DirectorySeparatorChar : char
  static val AltDirectorySeparatorChar : char
  static val VolumeSeparatorChar : char
  static val InvalidPathChars : char[]
  static val PathSeparator : char
  static member ChangeExtension : path:string * extension:string -> string
  static member Combine : params paths:string[] -> string + 3 overloads
  static member GetDirectoryName : path:string -> string
  static member GetExtension : path:string -> string
  static member GetFileName : path:string -> string
  ...

Full name: System.IO.Path
Path.GetTempFileName() : string
type Person =
  {Name: string;
   Age: int;
   Countries: string list;}

Full name: Frame.Person
Person.Name: string
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = String

Full name: Microsoft.FSharp.Core.string
Person.Age: int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
Person.Countries: string list
type 'T list = List<'T>

Full name: Microsoft.FSharp.Collections.list<_>
val peopleRecds : Person list

Full name: Frame.peopleRecds
val peopleList : Frame<int,string>

Full name: Frame.peopleList
static member Frame.ofRecords : series:Series<'K,'R> -> Frame<'K,string> (requires equality)
static member Frame.ofRecords : values:seq<'T> -> Frame<int,string>
static member Frame.ofRecords : values:Collections.IEnumerable * indexCol:string -> Frame<'R,string> (requires equality)
val people : Frame<string,string>

Full name: Frame.people
val indexRowsString : column:'C -> frame:Frame<'R1,'C> -> Frame<string,'C> (requires equality and equality)

Full name: Deedle.Frame.indexRowsString
member Frame.GetColumn : column:'TColumnKey -> Series<'TRowKey,'R>
member Frame.GetColumn : column:'TColumnKey * lookup:Lookup -> Series<'TRowKey,'R>
val wb : WorldBankData.ServiceTypes.WorldBankDataService

Full name: Frame.wb
WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService
val loadRegion : region:WorldBankData.ServiceTypes.Region -> Frame<int,(string * string)>

Full name: Frame.loadRegion


 Given a region, load GDP in current US$ and return data as
 a frame with two-level column key (region and country name)
val region : WorldBankData.ServiceTypes.Region
type ServiceTypes =
  nested type Countries
  nested type Country
  nested type Indicators
  nested type IndicatorsDescriptions
  nested type Region
  nested type Regions
  nested type Topic
  nested type Topics
  nested type WorldBankDataService

Full name: FSharp.Data.WorldBankData.ServiceTypes


<summary>Contains the types that describe the data service</summary>
type Region =
  inherit Region
  member Countries : Countries
  member Indicators : Indicators
  member Name : string
  member RegionCode : string

Full name: FSharp.Data.WorldBankData.ServiceTypes.Region
val country : WorldBankData.ServiceTypes.Country
property WorldBankData.ServiceTypes.Region.Countries: WorldBankData.ServiceTypes.Countries


<summary>The indicators for the region</summary>
property Runtime.WorldBank.Region.Name: string
property Runtime.WorldBank.Country.Name: string
Multiple items
module Series

from Deedle

--------------------
type Series =
  static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
  static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
  static member ofOptionalObservations : observations:seq<'K * 'a1 option> -> Series<'K,'a1> (requires equality)
  static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.F# Series extensions.Series

--------------------
type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:'K [] * values:'V [] -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member After : lowerExclusive:'K -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>
  ...

Full name: Deedle.Series<_,_>

--------------------
new : pairs:seq<Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : keys:'K [] * values:'V [] -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
static member Series.ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
property WorldBankData.ServiceTypes.Country.Indicators: WorldBankData.ServiceTypes.Indicators


<summary>The indicators for the country</summary>
val frame : columns:seq<'a * #ISeries<'c>> -> Frame<'c,'a> (requires equality and equality)

Full name: Deedle.F# Frame extensions.frame
val eu : Frame<int,(string * string)>

Full name: Frame.eu
property WorldBankData.ServiceTypes.WorldBankDataService.Regions: WorldBankData.ServiceTypes.Regions
val oecd : Frame<int,(string * string)>

Full name: Frame.oecd
val world : Frame<int,(string * string)>

Full name: Frame.world
member Frame.Join : otherFrame:Frame<'TRowKey,'TColumnKey> -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : otherFrame:Frame<'TRowKey,'TColumnKey> * kind:JoinKind -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> * kind:JoinKind -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : otherFrame:Frame<'TRowKey,'TColumnKey> * kind:JoinKind * lookup:Lookup -> Frame<'TRowKey,'TColumnKey>
member Frame.Join : colKey:'TColumnKey * series:Series<'TRowKey,'V> * kind:JoinKind * lookup:Lookup -> Frame<'TRowKey,'TColumnKey>
val peopleNested : Frame<int,string>

Full name: Frame.peopleNested
static member Series.ofValues : values:seq<'a0> -> Series<int,'a0>
val expandCols : names:seq<string> -> frame:Frame<'R,string> -> Frame<'R,string> (requires equality)

Full name: Deedle.Frame.expandCols
val tuples : Series<int,Collections.Generic.IDictionary<string,obj>>

Full name: Frame.tuples
val dict : keyValuePairs:seq<'Key * 'Value> -> Collections.Generic.IDictionary<'Key,'Value> (requires equality)

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.dict
val box : value:'T -> obj

Full name: Microsoft.FSharp.Core.Operators.box
val expandAllCols : nesting:int -> frame:Frame<'R,string> -> Frame<'R,string> (requires equality)

Full name: Deedle.Frame.expandAllCols
property Frame.Columns: ColumnSeries<string,string>
val numAges : Series<string,float>

Full name: Frame.numAges
member Series.Get : key:'K -> 'V
member Series.Get : key:'K * lookup:Lookup -> 'V
member Series.TryGet : key:'K -> OptionalValue<'V>
member Series.TryGet : key:'K * lookup:Lookup -> OptionalValue<'V>
member Series.GetAt : index:int -> 'V
member ObjectSeries.As : unit -> Series<'K,'R>
property Frame.Rows: RowSeries<string,string>
val mapValues : f:('T -> 'R) -> series:Series<'K,'T> -> Series<'K,'R> (requires equality)

Full name: Deedle.Series.mapValues
val row : ObjectSeries<string>
member ObjectSeries.GetAs : column:'K -> 'R
member ObjectSeries.GetAs : column:'K * fallback:'R -> 'R
type IPerson =
  interface
    abstract member Age : int
    abstract member Countries : string list
  end

Full name: Frame.IPerson


 Expected columns & their types in a row
abstract member IPerson.Age : int

Full name: Frame.IPerson.Age
abstract member IPerson.Countries : string list

Full name: Frame.IPerson.Countries
val rows : Series<string,IPerson>

Full name: Frame.rows
member Frame.GetRowsAs : unit -> Series<'TRowKey,'TRow>
type IPersonOpt =
  interface
    abstract member Age : OptionalValue<int>
    abstract member Countries : string list
  end

Full name: Frame.IPersonOpt


 Alternative that lets us handle missing 'Age' values
abstract member IPersonOpt.Age : OptionalValue<int>

Full name: Frame.IPersonOpt.Age
Multiple items
module OptionalValue

from Deedle

--------------------
type OptionalValue =
  static member Create : v:'a0 -> OptionalValue<'a0>
  static member Empty : unit -> OptionalValue<'T>
  static member OfNullable : v:Nullable<'T> -> OptionalValue<'T> (requires default constructor and value type and 'T :> ValueType)

Full name: Deedle.OptionalValue

--------------------
type OptionalValue<'T> =
  struct
    new : value:'T -> OptionalValue<'T>
    private new : hasValue:bool * value:'T -> OptionalValue<'T>
    override Equals : y:obj -> bool
    override GetHashCode : unit -> int
    override ToString : unit -> string
    member HasValue : bool
    member Value : 'T
    member ValueOrDefault : 'T
    static member Missing : OptionalValue<'T>
  end

Full name: Deedle.OptionalValue<_>

--------------------
OptionalValue()
new : value:'T -> OptionalValue<'T>
abstract member IPersonOpt.Countries : string list

Full name: Frame.IPersonOpt.Countries
val more : Series<string,float>

Full name: Frame.more
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.F# Series extensions.series
val add1 : Series<string,float>

Full name: Frame.add1
member Frame.ReplaceColumn : column:'TColumnKey * data:seq<'V> -> unit
member Frame.ReplaceColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
member Frame.ReplaceColumn : column:'TColumnKey * data:seq<'V> * lookup:Lookup -> unit
member Frame.ReplaceColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
val newRow : Series<string,obj>

Full name: Frame.newRow
member Frame.Merge : params otherFrames:Frame<'TRowKey,'TColumnKey> [] -> Frame<'TRowKey,'TColumnKey>
member Frame.Merge : otherFrames:seq<Frame<'TRowKey,'TColumnKey>> -> Frame<'TRowKey,'TColumnKey>
member Frame.Merge : otherFrame:Frame<'TRowKey,'TColumnKey> -> Frame<'TRowKey,'TColumnKey>
static member FrameExtensions.Merge : frame:Frame<'TRowKey,'TColumnKey> * rowKey:'TRowKey * row:ISeries<'TColumnKey> -> Frame<'TRowKey,'TColumnKey> (requires equality and equality)
val otherRow : SeriesBuilder<string>

Full name: Frame.otherRow
Multiple items
type SeriesBuilder<'K (requires equality)> =
  inherit SeriesBuilder<'K,obj>
  new : unit -> SeriesBuilder<'K>

Full name: Deedle.SeriesBuilder<_>

--------------------
type SeriesBuilder<'K,'V (requires equality and equality)> =
  interface IDynamicMetaObjectProvider
  interface IDictionary<'K,'V>
  interface seq<KeyValuePair<'K,'V>>
  interface IEnumerable
  new : unit -> SeriesBuilder<'K,'V>
  member Add : key:'K * value:'V -> unit
  member Series : Series<'K,'V>
  static member ( ?<- ) : builder:SeriesBuilder<string,'V> * name:string * value:'V -> unit

Full name: Deedle.SeriesBuilder<_,_>

--------------------
new : unit -> SeriesBuilder<'K>

--------------------
new : unit -> SeriesBuilder<'K,'V>
property SeriesBuilder.Series: Series<string,obj>
val nums : Series<int,float>

Full name: Frame.nums
val strs : Series<string,string>

Full name: Frame.strs
val ages : Series<string,float>

Full name: Frame.ages
val get : key:'K -> series:Series<'K,'T> -> 'T (requires equality)

Full name: Deedle.Series.get
val nan : float

Full name: Microsoft.FSharp.Core.Operators.nan
val tryGet : key:'K -> series:Series<'K,'T> -> 'T option (requires equality)

Full name: Deedle.Series.tryGet
val getAll : keys:seq<'K> -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.getAll
property Series.Observations: seq<Collections.Generic.KeyValuePair<string,float>>
val observations : series:Series<'K,'T> -> seq<'K * 'T> (requires equality)

Full name: Deedle.Series.observations
val observationsAll : series:Series<'K,'T> -> seq<'K * 'T option> (requires equality)

Full name: Deedle.Series.observationsAll
val opens : Series<DateTime,float>

Full name: Frame.opens
val e : exn
type Lookup =
  | Exact = 1
  | ExactOrGreater = 3
  | ExactOrSmaller = 5
  | Greater = 2
  | Smaller = 4

Full name: Deedle.Lookup
Lookup.ExactOrSmaller: Lookup = 5
val lookup : key:'K -> lookup:Lookup -> series:Series<'K,'T> -> 'T (requires equality)

Full name: Deedle.Series.lookup
Lookup.ExactOrGreater: Lookup = 3
val dates : DateTime list

Full name: Frame.dates
val m : int
val lookupAll : keys:seq<'K> -> lookup:Lookup -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.lookupAll
val mapKeys : f:('K -> 'R) -> series:Series<'K,'T> -> Series<'R,'T> (requires equality and equality)

Full name: Deedle.Series.mapKeys
val k : DateTime
DateTime.ToShortDateString() : string
val travels : Series<string,string list>

Full name: Frame.travels
val groupBy : keySelector:('K -> 'T -> 'TNewKey) -> series:Series<'K,'T> -> Series<'TNewKey,Series<'K,'T>> (requires equality and equality)

Full name: Deedle.Series.groupBy
val k : string
val v : string list
property String.Length: int
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val exists : predicate:('T -> bool) -> list:'T list -> bool

Full name: Microsoft.FSharp.Collections.List.exists
val groupInto : keySelector:('K -> 'T -> 'TNewKey) -> f:('TNewKey -> Series<'K,'T> -> 'TNewValue) -> series:Series<'K,'T> -> Series<'TNewKey,'TNewValue> (requires equality and equality)

Full name: Deedle.Series.groupInto
val len : int
val people : Series<string,string list>
val countKeys : series:Series<'K,'T> -> int (requires equality)

Full name: Deedle.Series.countKeys
module Seq

from Microsoft.FSharp.Collections
val countBy : projection:('T -> 'Key) -> source:seq<'T> -> seq<'Key * int> (requires equality)

Full name: Microsoft.FSharp.Collections.Seq.countBy
val id : x:'T -> 'T

Full name: Microsoft.FSharp.Core.Operators.id
static member Frame.ofRows : rows:seq<'R * #ISeries<'C>> -> Frame<'R,'C> (requires equality and equality)
static member Frame.ofRows : rows:Series<'R,#ISeries<'C>> -> Frame<'R,'C> (requires equality and equality)
val fillMissingWith : value:'T -> frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)

Full name: Deedle.Frame.fillMissingWith
val groupRowsByString : column:'C -> frame:Frame<'R,'C> -> Frame<(string * 'R),'C> (requires equality and equality)

Full name: Deedle.Frame.groupRowsByString
val byDecimal : Frame<(decimal * int),string>

Full name: Frame.byDecimal
Multiple items
val decimal : value:'T -> decimal (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.decimal

--------------------
type decimal = Decimal

Full name: Microsoft.FSharp.Core.decimal

--------------------
type decimal<'Measure> = decimal

Full name: Microsoft.FSharp.Core.decimal<_>
val groupRowsBy : column:'C -> frame:Frame<'R,'C> -> Frame<('K * 'R),'C> (requires equality and equality and equality)

Full name: Deedle.Frame.groupRowsBy
member Frame.GroupRowsBy : colKey:'TColumnKey -> Frame<('TGroup * 'TRowKey),'TColumnKey> (requires equality)
val groupRowsUsing : selector:('R -> ObjectSeries<'C> -> 'K) -> frame:Frame<'R,'C> -> Frame<('K * 'R),'C> (requires equality and equality and equality)

Full name: Deedle.Frame.groupRowsUsing
val k : int
val bySex : Frame<(string * int),string>

Full name: Frame.bySex
val bySex1 : Series<string,Frame<int,string>>

Full name: Frame.bySex1
val nest : frame:Frame<('R1 * 'R2),'C> -> Series<'R1,Frame<'R2,'C>> (requires equality and equality and equality)

Full name: Deedle.Frame.nest
val bySex2 : Frame<(string * int),string>

Full name: Frame.bySex2
val unnest : series:Series<'R1,Frame<'R2,'C>> -> Frame<('R1 * 'R2),'C> (requires equality and equality and equality)

Full name: Deedle.Frame.unnest
val byClassAndPort : Frame<(string * int * int),string>

Full name: Frame.byClassAndPort
val groupRowsByInt : column:'C -> frame:Frame<'R,'C> -> Frame<(int * 'R),'C> (requires equality and equality)

Full name: Deedle.Frame.groupRowsByInt
val mapRowKeys : f:('R1 -> 'R2) -> frame:Frame<'R1,'C> -> Frame<'R2,'C> (requires equality and equality and equality)

Full name: Deedle.Frame.mapRowKeys
module Pair

from Deedle
val flatten3 : v1:'a * ('b * 'c) -> 'a * 'b * 'c

Full name: Deedle.Pair.flatten3
val ageByClassAndPort : Series<(string * int * int),float>

Full name: Frame.ageByClassAndPort
type Stats =
  static member count : frame:Frame<'R,'C> -> Series<'C,int> (requires equality and equality)
  static member count : series:Series<'K,'V> -> int (requires equality)
  static member expandingCount : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingKurt : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingMax : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingMean : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingMin : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingSkew : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingStdDev : series:Series<'K,float> -> Series<'K,float> (requires equality)
  static member expandingSum : series:Series<'K,float> -> Series<'K,float> (requires equality)
  ...

Full name: Deedle.Stats
static member Stats.levelMean : level:('K -> 'L) -> series:Series<'K,float> -> Series<'L,float> (requires equality and equality)
val get1And2Of3 : v1:'a * v2:'b * 'c -> 'a * 'b

Full name: Deedle.Pair.get1And2Of3
val getNumericCols : frame:Frame<'R,'C> -> Series<'C,Series<'R,float>> (requires equality and equality)

Full name: Deedle.Frame.getNumericCols
val dropMissing : series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.dropMissing
static member Frame.ofColumns : cols:Series<'C,#ISeries<'R>> -> Frame<'R,'C> (requires equality and equality)
static member Frame.ofColumns : cols:seq<'C * #ISeries<'R>> -> Frame<'R,'C> (requires equality and equality)
type bool = Boolean

Full name: Microsoft.FSharp.Core.bool
val applyLevel : level:('K1 -> 'K2) -> op:(Series<'K1,'V> -> 'R) -> series:Series<'K1,'V> -> Series<'K2,'R> (requires equality and equality)

Full name: Deedle.Series.applyLevel
val values : series:Series<'K,'T> -> seq<'T> (requires equality)

Full name: Deedle.Series.values
val pivotTable : rowGrp:('R -> ObjectSeries<'C> -> 'RNew) -> colGrp:('R -> ObjectSeries<'C> -> 'CNew) -> op:(Frame<'R,'C> -> 'T) -> frame:Frame<'R,'C> -> Frame<'RNew,'CNew> (requires equality and equality and equality and equality)

Full name: Deedle.Frame.pivotTable
val r : ObjectSeries<string>
val countRows : frame:Frame<'R,'C> -> int (requires equality and equality)

Full name: Deedle.Frame.countRows
val table : Frame<string,bool>

Full name: Frame.table
static member FrameExtensions.PivotTable : frame:Frame<'R,'C> * r:'C * c:'C * op:Func<Frame<'R,'C>,'T> -> Frame<'RNew,'CNew> (requires equality and equality and equality and equality)
member Frame.PivotTable : r:'TColumnKey * c:'TColumnKey * op:(Frame<'TRowKey,'TColumnKey> -> 'T) -> Frame<'R,'C> (requires equality and equality and equality and equality)
val frame : Frame<int,string>
static member Stats.mean : frame:Frame<'R,'C> -> Series<'C,float> (requires equality and equality)
static member Stats.mean : series:Series<'K,float> -> float (requires equality)
val round : value:'T -> 'T (requires member Round)

Full name: Microsoft.FSharp.Core.Operators.round
property Frame.Columns: ColumnSeries<int,(string * string)>
val Lookup1Of2 : k:'a -> ICustomLookup<'b>

Full name: Deedle.MultiKeyExtensions.Lookup1Of2
val Lookup2Of2 : k:'a -> ICustomLookup<'b>

Full name: Deedle.MultiKeyExtensions.Lookup2Of2
val euro : Frame<int,string>

Full name: Frame.euro
val mapColKeys : f:('C -> 'a) -> frame:Frame<'R,'C> -> Frame<'R,'a> (requires equality and equality and equality)

Full name: Deedle.Frame.mapColKeys
val snd : tuple:('T1 * 'T2) -> 'T2

Full name: Microsoft.FSharp.Core.Operators.snd
val decades : Frame<(string * int),string>

Full name: Frame.decades
val sprintf : format:Printf.StringFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
property Frame.Rows: RowSeries<(string * int),string>
val fst : tuple:('T1 * 'T2) -> 'T1

Full name: Microsoft.FSharp.Core.Operators.fst
static member Stats.levelStdDev : level:('K -> 'L) -> series:Series<'K,float> -> Series<'L,float> (requires equality and equality)
val byGDP : Frame<(bool * string),(string * int)>

Full name: Frame.byGDP
val transpose : frame:Frame<'R,'TColumnKey> -> Frame<'TColumnKey,'R> (requires equality and equality)

Full name: Deedle.Frame.transpose
val v : ObjectSeries<string * int>
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
type Double =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MinValue : float
    static val MaxValue : float
    static val Epsilon : float
    static val NegativeInfinity : float
    static val PositiveInfinity : float
    ...
  end

Full name: System.Double
field float.NaN = NaN
Multiple items
type Nullable =
  static member Compare<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> int
  static member Equals<'T> : n1:Nullable<'T> * n2:Nullable<'T> -> bool
  static member GetUnderlyingType : nullableType:Type -> Type

Full name: System.Nullable

--------------------
type Nullable<'T (requires default constructor and value type and 'T :> ValueType)> =
  struct
    new : value:'T -> Nullable<'T>
    member Equals : other:obj -> bool
    member GetHashCode : unit -> int
    member GetValueOrDefault : unit -> 'T + 1 overload
    member HasValue : bool
    member ToString : unit -> string
    member Value : 'T
  end

Full name: System.Nullable<_>

--------------------
Nullable()
Nullable(value: 'T) : unit
val ozone : Series<int,float>

Full name: Frame.ozone
val mapAll : f:('K -> 'T option -> 'R option) -> series:Series<'K,'T> -> Series<'K,'R> (requires equality)

Full name: Deedle.Series.mapAll
val v : float option
union case Option.None: Option<'T>
union case Option.Some: Value: 'T -> Option<'T>
val fillMissingWith : value:'a -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.fillMissingWith
val fillMissing : direction:Direction -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.fillMissing
type Direction =
  | Backward = 0
  | Forward = 1

Full name: Deedle.Direction
Direction.Backward: Direction = 0
Direction.Forward: Direction = 1
val fillMissingUsing : f:('K -> 'T) -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.fillMissingUsing
val prev : OptionalValue<float>
val next : OptionalValue<float>
active recognizer Present: 'T opt -> Choice<unit,'T>

Full name: Deedle.OptionalValue.( |Missing|Present| )
val p : float
val n : float
val v : float
Fork me on GitHub