Back to Browse

File Formats [Row based vs Columnar Format] #parquet #avro #orc

7.2K views
Apr 4, 2023
12:59

If you need any guidance you can book time here, https://topmate.io/bhawna_bedi56743 Follow me on Linkedin https://www.linkedin.com/in/bhawna-bedi-540398102/ Instagram https://www.instagram.com/bedi_forever16/?next=%2F You can support my channel at: bhawnabedi15@okicici Row Based Format The Data is stored in rows such as: SequenceFile, MapFile, Avro Datafile. In this way, if only a small amount of data of the row needs to be accessed, the entire row needs to be read into the memory. Delaying the serialization can lighten the problem to a certain amount, but the overhead of reading the whole row of data from the disk cannot be withdrawn. Row-oriented storage is suitable for situations where the entire row of data needs to be processed simultaneously. Column Based Format The entire file cut into several columns of data, and each column of data stored together: Parquet, ORCFile. The column-oriented format makes it possible to skip unneeded columns when reading data, suitable for situations where only a small portion of the rows are in the field. But this format of reading and write requires more memory space because the cache line needs to be in memory (to get a column in multiple rows). At the same time, it is not suitable for streaming to write, because once the write fails, the current file cannot be recovered, and the line-oriented data can be resynchronized to the last synchronization point when the write fails, so Flume uses the line-oriented storage format. Data-bricks hands on tutorials https://www.youtube.com/playlist?list=PLtlmylp_ZK5wr1lyq76h1V4ZuWZYThgy0 Azure Event Hubs https://www.youtube.com/playlist?list=PLtlmylp_ZK5y_7ngCo3_9zB7UAauZNsFk Azure Data Factory Interview Question https://www.youtube.com/playlist?list=PLtlmylp_ZK5zdGe7KLM0axsSb_4LimVRX SQL leet code Questions https://www.youtube.com/playlist?list=PLtlmylp_ZK5xiosJ2eR2BooSSspe_7Ac4 Azure Synapse tutorials https://www.youtube.com/playlist?list=PLtlmylp_ZK5ygJXScE4DakN2aqplYkckf Azure Event Grid https://www.youtube.com/playlist?list=PLtlmylp_ZK5xXqBnnBuBLOojJ11ZsNx-Y Azure Data factory CI-CD https://www.youtube.com/playlist?list=PLtlmylp_ZK5yVc7dY_pLl4RaTi94y19Zz Azure Basics https://www.youtube.com/playlist?list=PLtlmylp_ZK5xTVdlmb5KJQDxWV5SfCWtq Data Bricks interview questions https://www.youtube.com/playlist?list=PLtlmylp_ZK5wV7mu3DrwjhLPxFJw6FS0V

Download

0 formats

No download links available.

File Formats [Row based vs Columnar Format] #parquet #avro #orc | NatokHD