Skip to contents

Mirrors the Python morie.profile_dataset(). Returns a list of per-column profiles plus dataset-level metadata.

Usage

morie_profile_dataset(df)

Arguments

df

A data.frame.

Value

A list with components:

n_rows, n_cols

Dataset dimensions.

columns

A named list, one entry per column, each containing name, dtype, measurement_level, n_missing, n_unique, and (for numeric columns) mean, sd, min, max, q25, q50, q75.

Examples

p <- morie_profile_dataset(iris)
p$columns$Species
#> $name
#> [1] "Species"
#> 
#> $dtype
#> [1] "factor"
#> 
#> $measurement_level
#> [1] "nominal"
#> 
#> $n_missing
#> [1] 0
#> 
#> $n_unique
#> [1] 3
#> 
p$columns$Sepal.Length
#> $name
#> [1] "Sepal.Length"
#> 
#> $dtype
#> [1] "numeric"
#> 
#> $measurement_level
#> [1] "ratio"
#> 
#> $n_missing
#> [1] 0
#> 
#> $n_unique
#> [1] 35
#> 
#> $mean
#> [1] 5.843333
#> 
#> $sd
#> [1] 0.8280661
#> 
#> $min
#> [1] 4.3
#> 
#> $max
#> [1] 7.9
#> 
#> $q25
#> [1] 5.1
#> 
#> $q50
#> [1] 5.8
#> 
#> $q75
#> [1] 6.4
#>