**Precision**

After running this snippet:

1 2 3 4 5 6 |
import numpy as np a = np.array([0.112233445566778899], dtype=np.float32) b = np.array([0.112233445566778899], dtype=np.float64) print(a, b) |

It print out:

1 |
[0.11223345] [0.11223345] |

Why np.float32 and np.float64 have the same output? The answer is: displaying of numpy array need to set options.

Let’s set option before print:

1 2 3 4 5 6 7 |
import numpy as np a = np.array([0.112233445566778899], dtype=np.float32) b = np.array([0.112233445566778899], dtype=np.float64) np.set_printoptions(precision=18) print(a, b) |

The result has became:

1 |
[0.112233445] [0.1122334455667789] |

which looks much reasonable.

Furthermore, why it prints out ‘0.1122334455667789’ which has only ’16’ precision instead of ’18’? Because the float64 only support about 15~16 precisions, as this reference said.

**Hidden metadata**

There are two parquet files which look different after using ‘cksum’ to compare. But after we export them as CSV files:

1 2 3 4 |
import pandas as pd df = pd.read_parquet("my.parquet") df.to_csv("my.csv") ... |

The two output CSV files are exactly the same.

Then what happened in those previous two parquet files? Dose parquet file have some hidden metadata in it?

As a matter of fact, parquet file will save the ‘index’ of a DataFrame of Pandas while CSV file will not. If we drop the index before writing out the parquet file:

1 2 3 |
df.reset_index(drop=True) df.to_parquet("my.parquet") ... |

These two parquet files would become identical.