In this Persian tutorial, we have worked on the prostate cancer dataset and ran a simple linear regression model on it beside, ordinal values, binary values and scale issues have been tackled. This video has been uploaded to both YouTube and Aparat.
Normalization vs standardization:
Data normalization is a technique used to scale the data between 0 and 1, where 0 represents the minimum value in the data and 1 represents the maximum value in the data. This technique is useful when the range of values in the data is very wide and we want to rescale the values to a smaller range. On the other hand, standardization is a technique used to transform the data such that it has zero mean and unit variance. This technique is useful when the data has a wide range of values and we want to ensure that the values are centered around 0 and have a similar scale. In general, normalization is used when the distribution of the data is not known or when the distribution is not Gaussian, while standardization is used when the distribution of the data is Gaussian or when we don’t know the distribution but we want to give equal importance to all features.
