First principles thinking in Data Science.

2 min readSep 15, 2022

First principles thinking is defined as “boiling problems down to their most fundamental truths”.

So when it comes to Data Science, what are the first principles ?

In my opinion they are :

Measures of central tendency — Mean, Median, Mode.
Measures of dispersion — Variance, Standard Deviation, Interquartile Range.

Most of the topics in Data Science somehow boil down to central tendency or dispersion. Let me explain through some examples:

Linear regression :
Generally, One models the expected value (the mean) not the raw value of dependent variable.
Pls note one can model any quantile in linear regression.
Probability distributions :
The famous normal distribution is characterized by location parameter (mean) and scale parameter (standard deviation).
Similarly other distributions too are characterized by location and scale parameter.
Machine learning :
Model Drift: When we say the model has drifted, it actually means that the existing model has drifted in terms of the location or scale parameter or both from the real model.
Accuracy metrics: Accuracy metrics like F1 is nothing but harmonic mean.
Outlier detection or Anomaly detection : We classify something as an outlier if some data point is 2SD or 3SD or even 6SD.
Time series forecasting :
Well one of the key concepts in Time series forecasting is stationarity. A stationary time series is the one whose properties such as mean, variance and autocorrelation structure stays constant over time. Stationarity is important because it is easier and more accurate to estimate parameters of a series whose properties do not change over time. If the mean and variance of the series keep on changing over time, the accuracy of the estimates will vary over time.
Hypothesis testing :
We have hypothesis testing of mean and difference in means. For e.g. t-test and ANOVA.
Information theory :
Many algorithms like Decision trees, model comparison techniques like AIC use information theory at its core. Even probability distribution comparisons techniques KL Divergence uses Information theory concepts like Entropy, Information gain etc. Well Entropy again is the expected value (average) of the self-information of a variable
or
the Entropy is the smallest possible average size of lossless encoding of the messages sent from the source to the destination.

For Data Science Consulting and Solutions;

Get in touch with us at:

Website: https://www.arymalabs.com/

Linkedin: http://www.linkedin.com/in/venkat-raman-Analytics

First principles thinking in Data Science.

Written by Venkat Raman