Prompting Gemma 3n AI in Japanese (日本語) for Datalake Setup and KPI Reporting

anandtk · 4 weeks ago

Introduction:

In today’s digital landscape, data is more than a valuable asset — it’s the backbone of strategic advantage. Truly owning customer data means having complete control over its quality, accessibility, and purpose. It’s about dismantling silos, connecting disparate sources, and creating a unified, trusted view of information. To power that data is to put it to work — converting raw inputs into meaningful insights, performance indicators, and predictive outcomes that enable smarter decisions. From real-time dashboards to machine learning and self-service analytics, activated data becomes the engine driving innovation, speed, and adaptability. With the integration of Gemma 3n AI, organizations can now automate data understanding, accelerate data modeling, and generate intelligent prompts that streamline the creation of KPIs, dashboards, and decision-ready insights — all at scale and in natural language. This transforms not just how data is used, but who can use it.

"Owning the data is owning the throne"

Details:

Modern cloud data platforms like Snowflake, Google BigQuery, Amazon Redshift, Azure Synapse, Databricks Delta Lake, and Apache Iceberg serve as powerful replacements for traditional data warehouses and datamarts.
These platforms form the backbone of today's enterprise data stacks, handling massive data flows from diverse sources — including spreadsheets, CSV files, JSON, YAML, and XML.
Such data is typically ingested and processed through a staging layer before being transformed and loaded into analytics-ready formats.

Bronze Layer/Raw Zone/Landing zones:

These staging tables are called Data Lake Gen2 - Bronze Layer, S3 bucket - Raw Zone, Staging Table, Work Table, or Landing zones/tables. These data tables/sets are being used to create Dimension(dim_*), Fact Tables(fct_* -common name formats) and Measures table . These tables from staging tables is a standard step in building a modern data pipeline. This usually happens during the transformation phase of ETL/ELT(Extraxt, Transform and Load) — where raw data from staging is cleaned, joined, deduplicated, and reshaped into analytic-ready fact tables.

Raw Source Data (spreadsheets, CSV files, JSON, YAML, XML, etc.)
↓
Staging Tables (raw cleaned data)
↓
Transformations (ETL/ELT, joins, filters, aggregates)
↓
Fact Tables (dimensions, measurable events: sales, clicks, orders)

These Fact tables are subsets of staging data. They are called as Dimensions, Measures.

Star Schema and Snowflake Schemas are used to organize fact and dimension tables in relational databases, data warehouses (Snowflake, Redshift, BigQuery, Synapse), and even lakehouses (Delta Lake, Iceberg).

Schemas:

A Star Schema is a denormalized structure where a central fact table connects directly to multiple dimension tables — forming a star-like layout.

dim_table1
|
dim_table2 — fact_table — dim_table3
|
dim_table4

A Snowflake Schema is a normalized version of the star schema — where dimensions are broken into sub-dimensions, creating a snowflake-like pattern.

dim_table1
|
dim_table2
|
dim_table3 — fct_table1 — dim_table4 — dim_table5
|
dim_table6

Using schemas, tables such as facts, dimensions, and measures are structured and organized to prepare data for reporting and analytics.

Usage of these data:

Business Intelligence (BI) & Reporting: To build dashboards, scorecards, and reports
KPI Tracking & Performance Management: Use measures like revenue, churn rate, retention to track business goals
Data Science & Machine Learning: As inputs/features to ML models (e.g., churn prediction, customer segmentation)
Financial & Operational Reporting: Structured reporting for accounting, P&L, compliance, audits.
Automated Alerts & Monitoring: Set alerts or notifications based on measure thresholds.
Embedded Analytics: Used inside apps, customer portals, or SaaS platforms via embedded dashboards.

By simply describing required data goals in natural language, Gemma 3n can assist in modeling fact and dimension tables, defining measures, and even crafting queries or dashboards tailored to desired KPIs.

Gemma 3n:
Gemma 3n is the latest evolution(released on June 26, 2025) in the Gemma model series, engineered for speed, efficiency, and versatility. It’s ideal for users who want privacy, high performance, and offline capabilities for advanced AI tasks.

Key Features:

Optimized Local Performance: Runs approximately 1.5x faster than previous models with improved output quality.
Multimodal Support: Understands text, images, audio, and video.
Efficient Resource Use: Features PLE caching and conditional parameter loading to minimize memory and storage usage.
Privacy-First: 100% offline processing—no data leaves on device.
32K Token Context Window: Handles large inputs with ease.
Enhanced Multilingual Capabilities: Supports Japanese, German, Korean, Spanish, French, and more.

Use case:

Following is the prompt(provided in Japanese Language) used to create a report.
"ステージングスキーマのSERVERおよびAPPSテーブルを削除し、factスキーマにservernameとitemをTEXT型で持つSERVERテーブルを作成し、APPNAME、method、weekをTEXT型で持つAPPSテーブルもfactスキーマに作成します。その後、ステージングスキーマのinventory_rawからデータを取得し、factスキーマのSERVERおよびAPPSテーブルにLIMIT 100で挿入します。さらに、week列がNULLの行を削除し、factスキーマのAPPSテーブルからデータを読み取ります。"
"Drop SERVER, APPS table in staging schema then create table called SERVER with servername, item as text in fact schema, create table called APPS with columns APPNAME, method, week as text in fact schema and insert from staging.inventory_raw from staging schema to SERVER and APPS table with fact schema using limit 100 fact schema and DELETE rows where is week column is null and read data in APPS table in fact schema"

This prompt has been sent to Gemma3n to do the following activities:

Drop facts table SERVER, APPS tables.
Create table called SERVER with servername, item as text in staging schema, create table called APPS with columns APPNAME, method, week as text in staging schema.
Insert from staging.inventory_raw from staging schema to SERVER and APPS table with staging schema using limit 100 staging schema.
DELETE rows where is week column is null.
Read data in APPS table in staging schema.

Here is the Python code snippet:

import requests

import psycopg2

import pandas as pd

import matplotlib.pyplot as plt

import matplotlib.font_manager as fm

user_prompt = "ステージングスキーマのSERVERおよびAPPSテーブルを削除し、factスキーマにservernameとitemをTEXT型で持つSERVERテーブルを作成し、APPNAME、method、weekをTEXT型で持つAPPSテーブルもfactスキーマに作成します。その後、ステージングスキーマのinventory_rawからデータを取得し、factスキーマのSERVERおよびAPPSテーブルにLIMIT 100で挿入します。さらに、week列がNULLの行を削除し、factスキーマのAPPSテーブルからデータを読み取ります"

response = requests.post(

"http://localhost:11434/api/generate",

json={

"model": "gemma:3n",

"prompt": f"{user_prompt}\nOnly output the raw SQL code.",

"stream": False # ensure non-streaming for easier parsing

}

)

try:

sql_query = response.json()['response'].strip()

if sql_query.startswith("```"):

sql_query = sql_query.split("```")[1].strip()

sql_query = sql_query.strip()

if sql_query.lower().startswith("sql"):

sql_query = "\n".join(sql_query.splitlines()[1:]).strip()

except Exception as e:

print("Failed to parse JSON from Ollama:")

print(response.text)

raise e

[Connection to PostgreSQLDB here]

if 'SELECT' in sql_query.strip().upper():

df = pd.read_sql(sql_query, conn)

else:

with conn.cursor() as cur1:

cur1.execute("SET search_path TO staging;")

cur1.execute(sql_query)

conn.commit()

df['week'] = df['week'].astype(str)

pivot_df = df.pivot_table(index='week', columns='appname', values='usage_count', aggfunc='sum', fill_value=0)

pivot_df.plot(kind='bar', stacked=True, figsize=(12, 6))

font_path = r"\NotoSansCJKjp-Regular.otf"

jp_font = fm.FontProperties(fname=font_path)

plt.rcParams['font.family'] = 'Noto Sans CJK JP'

plt.title('App Usage per week: 週ごとのアプリ使用状況 (Grouped by APPNAME アプリ名)',fontproperties=jp_font)

plt.xlabel('week週',fontproperties=jp_font)

plt.ylabel('usage_count使用回数',fontproperties=jp_font)

plt.legend(title='appname', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()

plt.show()

The following bar chart was generated after completing all SQL table creations, schema definitions, data transformations, and deduplication.

Summary:

Gemma 3n drastically reduces the manual effort needed in data engineering and reporting by turning simple prompts into fully functional, production-ready KPI pipelines — all while supporting multi-format data and cloud-native storage layers.

Prompt-Driven Setup:
Using natural language prompts, Gemma 3n can automate the creation of schemas, fact and dimension tables, and data pipelines within modern datalake architectures (e.g., Delta Lake, BigQuery, Redshift).
Data Ingestion & Staging:
Structured and semi-structured data from sources like CSV, JSON, and databases is ingested into a staging layer. Gemma 3n interprets source metadata and creates necessary staging schemas.
Modeling & Transformation:
The model assists in generating SQL or PySpark code to transform raw data into cleaned, analytics-ready formats — defining measures, facts, dimensions, and time-series aggregations as needed for KPIs.
Data Quality & Filtering:
Deduplication, null checks, and joins are automatically scripted based on prompt instructions, ensuring data reliability before analysis.
KPI Report Generation:
Based on prompts like "Show weekly app usage trends," Gemma 3n generates SQL queries and reporting logic to extract metrics such as usage count, active users, or growth rates — often formatted for dashboards or BI tools.
Visualization (Optional):
It can even output code (e.g., using matplotlib or plotly) to render bar charts, trend lines, and heatmaps — completing the end-to-end KPI reporting pipeline.

Thanks and regards,
Anand Thirtha Korlahalli
Infrastructure & Integration Services
Remote Professional Services
HPE Operations – Services Experience Delivery
I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Prompting Gemma 3n AI in Japanese (日本語) for Datalake Setup and KPI Reporting

Prompting Gemma 3n AI in Japanese (日本語) for Datalake Setup and KPI Reporting