# Zero Mean Là Gì – Forum Machine Learning Cơ Bản

I”m trying to reproduce an algorithm designed in a paper. And everything is going well except one thing:

It says we considered the lengths zero-meaned accelerometer vectors and created a feature for the mean and standard deviation of this value. and I do not understand what is it zero-meaned vectors?

Example dataset:

Can any body help me?

I found only this information https://www.quora.com/What-does-it-mean-when-a-vector-is-zero-mean but I”m not sure about it.

Thank you.

machine-learning pandas
Share
Improve this question
Follow
asked Sep 25 “16 at 13:59

ArsenArsen
\$endgroup\$

3
\$egingroup\$
“Zero-meaned” means the vector has been transformed so that its mean is 0.

Typically, you would do this by subtracting the mean of each column from that column. (This is for dimensional as well as algorithmic reasons; you don”t want to subtract a person”s weight from their height.)

It sounds like here they”re actually talking about the row mean–that is, \$(-0.6946377, 12.680544, 0.50395286)\$ would be transformed to \$(-4.857924, 8.5172577, -3.65933344, 4.1632863, 7.40047)\$, where the first three are the original features minus the row mean, the fourth is the row mean, and the fifth is the standard deviation of the original features.

This would make sense if the three have the same units (if they”re all accelerations at the same scale, this works), and so you want a separate measure of how much it”s being accelerated at all and how much it”s being accelerated in a particular direction.

Share
Follow
edited Sep 25 “16 at 16:12
answered Sep 25 “16 at 16:10

Matthew GravesMatthew Graves
\$endgroup\$
2
2
\$egingroup\$
Mean centering is one of many related techniques to preprocess data for downstream analysis in multivariate methods.

It might sound odd at first, but it means exactly what it says: the vector has a mean of zero. In pseudocode, (sum(vector) / len(vector)) == 0.

In multivariate data, this typically is applied along each column in a dataset so each column can be more easily compared to another within a similar range of data. After mean centering, each row only includes how it differs from the average sample from that variable in the original data. Typically, samples are also scaled to have unit variance as well, allowing you to more readily compare the data across continuous variables with different ranges.

For example, if you had a dataset of patients with variables height, weight, age, household_income, despite each variable being a continuous value each of these variables will be in different range. Height might be between 60 — 75 inches, weight between 100 — 300 lbs, and so on.

Why do all this? Removing the mean and standardizing the variance will help downstream methods not “learn” the mean and variance of your data, making it easier to find relationships between variables. Many assume that your data is centered / scaled / normalized in some way and will behave poorly if you don”t do so.