Profile a data.frame: per-column types, missingness, summary statistics
Source:R/dataset_profile.R
morie_profile_dataset.RdMirrors the Python morie.profile_dataset(). Returns a list of
per-column profiles plus dataset-level metadata.
Value
A list with components:
n_rows,n_colsDataset dimensions.
columnsA named list, one entry per column, each containing
name,dtype,measurement_level,n_missing,n_unique, and (for numeric columns)mean,sd,min,max,q25,q50,q75.
Examples
p <- morie_profile_dataset(iris)
p$columns$Species
#> $name
#> [1] "Species"
#>
#> $dtype
#> [1] "factor"
#>
#> $measurement_level
#> [1] "nominal"
#>
#> $n_missing
#> [1] 0
#>
#> $n_unique
#> [1] 3
#>
p$columns$Sepal.Length
#> $name
#> [1] "Sepal.Length"
#>
#> $dtype
#> [1] "numeric"
#>
#> $measurement_level
#> [1] "ratio"
#>
#> $n_missing
#> [1] 0
#>
#> $n_unique
#> [1] 35
#>
#> $mean
#> [1] 5.843333
#>
#> $sd
#> [1] 0.8280661
#>
#> $min
#> [1] 4.3
#>
#> $max
#> [1] 7.9
#>
#> $q25
#> [1] 5.1
#>
#> $q50
#> [1] 5.8
#>
#> $q75
#> [1] 6.4
#>