HPE Ezmeral Software platform
1820666 Members
2397 Online
109626 Solutions
New Discussion юеВ

Connect Unified Analytics (Presto) to Ezmeral Data Fabric using Hive Parquet connector

 
RCCardoso
Advisor

Connect Unified Analytics (Presto) to Ezmeral Data Fabric using Hive Parquet connector

Hi, I'm wondering what I'm doing wrong. I got a big CSV and converted it to parquet using Python Pandas and Numpy. Then I upload it to Object Storage on Data Fabric and read it using Hive Connector from Unified Analytics. The problem is... when I use a smal sample of file it works well but when I try with a larger sample (but not too large... around 100 MB) I got this problem:

Query failed (#20240612_164659_00147_ekmiu): can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@30b217e0

 

Always this 'uncompressed_page_size' stuff. I'm wondering if I need to customize something in the Presto Pod configuration.

2 REPLIES 2
support_s
System Recommended

Query: Connect Unified Analytics (Presto) to Ezmeral Data Fabric using Hive Parquet connector

System recommended content:

1. HPE Ezmeral Unified Analytics Software 1.2 Documentation | Hive Connection Parameters

2. HPE Ezmeral Unified Analytics Software 1.3 Documentation | Hive Connection Parameters

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

mitcheljohns
Occasional Visitor

Re: Connect Unified Analytics (Presto) to Ezmeral Data Fabric using Hive Parquet connector

Connecting Unified Analytics Presto to Ezmeral Data Fabric involves configuring Presto to access data stored in Ezmeral's distributed file system. This integration enables advanced analytics on large datasets, leveraging Presto's SQL querying capabilities with Ezmeral's robust data management. Follow the connection setup guide, configure the necessary connectors, and ensure proper authentication to facilitate seamless and efficient data analysis across the platforms.