Tools

The Data Quality Label Tool

In collaboration with NDQF, and supported by the Population Council, a team of researchers at IIIT-Delhi has developed a tool using ML techniques to measure the data quality. The domain agnostic tool takes a query dataset and its codebook to derive a composite score combining provenance, meta-data coupling, anomalous features, and statistical properties among others. The underlying model was trained on data and meta-data from more than 250 publicly available datasets and validated on multiple rounds of NFHS datasets. The data quality assessment tool is publicly available.

Outlier Detection Tool

NDQF data science lab has developed an outlier detection tool to identify the potential outliers in the dataset using machine learning techniques. The tool works for any survey dataset using multiple data science approaches like silhouette score calculations, k-means clustering and isolation forest to flag observations that are potential outliers. Unlike most methods of outlier detection that helps in identifying outliers within one variable (one-dimensional), this tool will help to solve the bigger challenge of finding outliers in multidimensional space. The outlier detection tool is available in public domain.