How to Read Parquet Files on macOS

Overview

This article introduces a simple way to read Parquet files on macOS via the command line.

Reading with parquet-cli

We will use the Parquet file provided as a sample by Feast.

Install parquet-cli on macOS using Homebrew.

1
brew install parquet-cli

Check the meta information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ parquet meta driver_stats.parquet

File path:  driver_stats.parquet
Created by: parquet-cpp-arrow version 18.1.0
Properties:
pandas: {"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1807, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "event_timestamp", "field_name": "event_timestamp", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns]", "metadata": {"timezone": "UTC"}}, {"name": "driver_id", "field_name": "driver_id", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "conv_rate", "field_name": "conv_rate", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "acc_rate", "field_name": "acc_rate", "pandas_type": "float32", "numpy_type": "float32", "metadata": null}, {"name": "avg_daily_trips", "field_name": "avg_daily_trips", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "created", "field_name": "created", "pandas_type": "datetime", "numpy_type": "datetime64[us]", "metadata": null}], "creator": {"library": "pyarrow", "version": "18.1.0"}, "pandas_version": "2.2.3"}
ARROW:schema: /////xgGAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAHAEAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAABIBAAABAAAADsEAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDE4MDcsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZXZlbnRfdGltZXN0YW1wIiwgImZpZWxkX25hbWUiOiAiZXZlbnRfdGltZXN0YW1wIiwgInBhbmRhc190eXBlIjogImRhdGV0aW1ldHoiLCAibnVtcHlfdHlwZSI6ICJkYXRldGltZTY0W25zXSIsICJtZXRhZGF0YSI6IHsidGltZXpvbmUiOiAiVVRDIn19LCB7Im5hbWUiOiAiZHJpdmVyX2lkIiwgImZpZWxkX25hbWUiOiAiZHJpdmVyX2lkIiwgInBhbmRhc190eXBlIjogImludDY0IiwgIm51bXB5X3R5cGUiOiAiaW50NjQiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogImNvbnZfcmF0ZSIsICJmaWVsZF9uYW1lIjogImNvbnZfcmF0ZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiYWNjX3JhdGUiLCAiZmllbGRfbmFtZSI6ICJhY2NfcmF0ZSIsICJwYW5kYXNfdHlwZSI6ICJmbG9hdDMyIiwgIm51bXB5X3R5cGUiOiAiZmxvYXQzMiIsICJtZXRhZGF0YSI6IG51bGx9LCB7Im5hbWUiOiAiYXZnX2RhaWx5X3RyaXBzIiwgImZpZWxkX25hbWUiOiAiYXZnX2RhaWx5X3RyaXBzIiwgInBhbmRhc190eXBlIjogImludDMyIiwgIm51bXB5X3R5cGUiOiAiaW50MzIiLCAibWV0YWRhdGEiOiBudWxsfSwgeyJuYW1lIjogImNyZWF0ZWQiLCAiZmllbGRfbmFtZSI6ICJjcmVhdGVkIiwgInBhbmRhc190eXBlIjogImRhdGV0aW1lIiwgIm51bXB5X3R5cGUiOiAiZGF0ZXRpbWU2NFt1c10iLCAibWV0YWRhdGEiOiBudWxsfV0sICJjcmVhdG9yIjogeyJsaWJyYXJ5IjogInB5YXJyb3ciLCAidmVyc2lvbiI6ICIxOC4xLjAifSwgInBhbmRhc192ZXJzaW9uIjogIjIuMi4zIn0ABgAAAHBhbmRhcwAABgAAACwBAADcAAAApAAAAHAAAAA0AAAABAAAAPz+//8AAAEKEAAAABgAAAAEAAAAAAAAAAcAAABjcmVhdGVkAGr///8AAAIAKP///wAAAQIQAAAAIAAAAAQAAAAAAAAADwAAAGF2Z19kYWlseV90cmlwcwBo////AAAAASAAAABg////AAABAxAAAAAcAAAABAAAAAAAAAAIAAAAYWNjX3JhdGUAAAAA0v///wAAAQCQ////AAABAxAAAAAgAAAABAAAAAAAAAAJAAAAY29udl9yYXRlAAYACAAGAAYAAAAAAAEAxP///wAAAQIQAAAAJAAAAAQAAAAAAAAACQAAAGRyaXZlcl9pZAAAAAgADAAIAAcACAAAAAAAAAFAAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAACgAAAAEAAAAAAAAAA8AAABldmVudF90aW1lc3RhbXAACAAMAAYACAAIAAAAAAADAAQAAAADAAAAVVRDAAAAAAA=
Schema:
message schema {
optional int64 event_timestamp (TIMESTAMP(NANOS,true));
optional int64 driver_id;
optional float conv_rate;
optional float acc_rate;
optional int32 avg_daily_trips;
optional int64 created (TIMESTAMP(MICROS,false));
}


Row group 0:  count: 1807  16.88 B records  start: 4  total(compressed): 29.796 kB total(uncompressed):29.760 kB
--------------------------------------------------------------------------------
                 type      encodings count     avg size   nulls   min / max
event_timestamp  INT64     S _ R     1807      2.78 B     0       "2021-04-12T07:00:00.00000..." / "2024-12-28T14:00:00.00000..."
driver_id        INT64     S _ R     1807      0.07 B     0       "1001" / "1005"
conv_rate        FLOAT     S _ R     1807      5.42 B     0       "1.9221554E-4" / "0.9998668"
acc_rate         FLOAT     S _ R     1807      5.42 B     0       "2.1329636E-4" / "0.99993944"
avg_daily_trips  INT32     S _ R     1807      3.14 B     0       "0" / "999"
created          INT64     S _ R     1807      0.05 B     0       "2024-12-28T15:20:28.266000" / "2024-12-28T15:20:28.266000"

View the data using head.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ parquet head driver_stats.parquet

{"event_timestamp": 1734102000000000000, "driver_id": 1005, "conv_rate": 0.27734742, "acc_rate": 0.7152132, "avg_daily_trips": 823, "created": 1735399228266000}
{"event_timestamp": 1734105600000000000, "driver_id": 1005, "conv_rate": 0.57354224, "acc_rate": 0.9831811, "avg_daily_trips": 851, "created": 1735399228266000}
{"event_timestamp": 1734109200000000000, "driver_id": 1005, "conv_rate": 0.3287562, "acc_rate": 0.6172164, "avg_daily_trips": 116, "created": 1735399228266000}
{"event_timestamp": 1734112800000000000, "driver_id": 1005, "conv_rate": 0.045716193, "acc_rate": 0.032996926, "avg_daily_trips": 741, "created": 1735399228266000}
{"event_timestamp": 1734116400000000000, "driver_id": 1005, "conv_rate": 0.12863782, "acc_rate": 0.8951942, "avg_daily_trips": 534, "created": 1735399228266000}
{"event_timestamp": 1734120000000000000, "driver_id": 1005, "conv_rate": 0.9555806, "acc_rate": 0.62216556, "avg_daily_trips": 216, "created": 1735399228266000}
{"event_timestamp": 1734123600000000000, "driver_id": 1005, "conv_rate": 0.75297666, "acc_rate": 0.37602386, "avg_daily_trips": 954, "created": 1735399228266000}
{"event_timestamp": 1734127200000000000, "driver_id": 1005, "conv_rate": 0.46957988, "acc_rate": 0.6454945, "avg_daily_trips": 360, "created": 1735399228266000}
{"event_timestamp": 1734130800000000000, "driver_id": 1005, "conv_rate": 0.6702387, "acc_rate": 0.36532214, "avg_daily_trips": 396, "created": 1735399228266000}
{"event_timestamp": 1734134400000000000, "driver_id": 1005, "conv_rate": 0.019627139, "acc_rate": 0.528229, "avg_daily_trips": 833, "created": 1735399228266000}

Check the schema.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ parquet schema driver_stats.parquet
{
"type" : "record",
"name" : "schema",
"fields" : [ {
"name" : "event_timestamp",
"type" : [ "null", "long" ],
"default" : null
}, {
"name" : "driver_id",
"type" : [ "null", "long" ],
"default" : null
}, {
"name" : "conv_rate",
"type" : [ "null", "float" ],
"default" : null
}, {
"name" : "acc_rate",
"type" : [ "null", "float" ],
"default" : null
}, {
"name" : "avg_daily_trips",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "created",
"type" : [ "null", {
"type" : "long",
"logicalType" : "local-timestamp-micros"
} ],
"default" : null
} ]
}

Conclusion

This article explained how to easily read Parquet files on macOS.