Generate metadata for a dataset in R

Generate metadata for a dataset in R

Looking at some data in R is a crucial part of any analysis. Even before we start with anything we want to understand the data in hand. Sometimes our datasets are too big to look at each column one by one. Here comes an easy way to generate metadata for your dataset. Although there are various aspects and features you may want to look at, still there are few common that you can look with a simple utility package called metadata available on GitHub.  You can simply install it and use it like a charm.

Installation

As of now, this package is currently available only on Github so you need to have devtools plugin to install it.

Install devtools (if you don’t have it)

install.packages("devtools")

Load devtools and install metadata

library(devtools)
devtools::install_github("ankitkatiyar91/metadata")

Yup! you are ready to use it.

metadata defines two functions only (at least when I am writing this)

  • getmode
  • generateMeta

generateMeta() is general purpose function that we can use to generate metadata. This function returns a dataset that contains names of the columns and some key properties about them.

let’s try this on built-in iris dataset

metadata::generateMeta(iris)

Output:

          name na_Count blanks unique min max range medians mean mode
1 Sepal.Length        0      0     35 4.3 7.9   3.6    5.80 5.84  5.0
2  Sepal.Width        0      0     23 2.0 4.4   2.4    3.00 3.06  3.0
3 Petal.Length        0      0     43 1.0 6.9   5.9    4.35 3.76  1.4
4  Petal.Width        0      0     22 0.1 2.5   2.4    1.30 1.20  0.2
5      Species        0      0      3 0.0 0.0   0.0    0.00 0.00  0.0

 

Reference: https://github.com/ankitkatiyar91/metadata

Leave a Reply

Your email address will not be published. Required fields are marked *