A production-ready framework for Microsoft Fabric and Power BI paginated report automation. This Python-based batch executor enables automated generation of paginated reports with dynamic parameters from multiple data sources including Lakehouse, Semantic Models, Warehouses, and JSON arrays. Built for enterprise data engineering workflows with features like retry logic, OneLake storage integration, and Azure pipeline orchestration via REST API.
✅ Four Flexible Parameter Sources
✅ Output Options
✅ Enterprise-Ready
✅ User-Friendly
paginated_report_batch_executor.ipynbSelect one of the four parameter sources based on your needs:
| Source | Best For | Setup Effort |
|---|---|---|
| Semantic Model | Report-driven parameters, business users | Low |
| Lakehouse | Large lists, data engineers | Low |
| JSON | Testing, static lists | Minimal |
| Warehouse | Enterprise, complex SQL | Medium |
pipeline/pipeline_definition.jsonPipeline (Scheduled/Manual Trigger)
│
├─ Parameters: report_id, static_params, report_partitioning_source, etc.
│
▼
Fabric Notebook: Report Batch Executor
│
├─ Cell 1-2: Parameters & Imports
│
├─ Cell 3: Input Validation & ParameterLoader
│ ├─ InputValidator class
│ └─ ParameterLoader class (supports 4 sources)
│
├─ Cell 4: Token Manager & Validation
│ ├─ TokenManager class (automatic refresh)
│ └─ Parse & validate all parameters
│
├─ Cell 5: Power BI API Functions
│ ├─ handle_api_response()
│ ├─ initiate_report_export()
│ ├─ poll_export_status()
│ └─ download_report_file()
│
├─ Cell 6: File Storage Functions
│ ├─ sanitize_filename()
│ ├─ generate_file_name()
│ └─ save_to_onelake()
│
├─ Cell 7: Main Execution Function
│ └─ execute_report_with_retry()
│
├─ Cell 8: Load Special Parameter Values
│ └─ ParameterLoader.load()
│ ├─ Semantic Model (DAX query via sempy)
│ ├─ Lakehouse (Spark SQL)
│ ├─ JSON (Direct parsing)
│ └─ Warehouse (pyodbc + T-SQL)
│
├─ Cell 9: Main Execution Loop
│ └─ FOR EACH partitioning parameter value:
│ ├─ Merge static + special params
│ ├─ Execute report (Power BI API)
│ ├─ Poll & download
│ ├─ Save to OneLake
│ └─ Retry 3x on failure
│
└─ Cell 10: Results & Exit
└─ Return summary to pipeline
Query data from Power BI semantic models using DAX.
Advantages:
Configuration:
{
"report_partitioning_source": "semantic_model",
"semantic_model_workspace_id": "workspace-guid",
"semantic_model_dataset_id": "dataset-guid",
"semantic_model_dax_query": "EVALUATE FILTER(DISTINCT('DimCustomer'[CustomerName]), 'DimCustomer'[IsActive] = TRUE)"
}
Example DAX Queries:
-- Get all active customers
EVALUATE FILTER(
DISTINCT('DimCustomer'[CustomerName]),
'DimCustomer'[IsActive] = TRUE
)
-- Get top 10 customers by sales
EVALUATE TOPN(
10,
SUMMARIZECOLUMNS(
'DimCustomer'[CustomerName],
"TotalSales", SUM('FactSales'[SalesAmount])
),
[TotalSales], DESC
)
-- Get customers with sales in last 12 months
EVALUATE CALCULATETABLE(
DISTINCT('DimCustomer'[CustomerName]),
DATESINPERIOD('DimDate'[Date], TODAY(), -12, MONTH)
)
See setup/sample_semantic_model_queries.dax for more examples.
Query Delta tables in Lakehouse using Spark SQL.
Advantages:
Setup:
-- Run in Lakehouse notebook
CREATE TABLE parameter_config (
Category STRING,
ParameterValue STRING,
IsActive BOOLEAN,
SortOrder INT
) USING DELTA;
-- Insert sample data
INSERT INTO parameter_config VALUES
('MonthlyReportCustomers', 'Acme Corp', true, 1),
('MonthlyReportCustomers', 'TechStart Inc', true, 2),
('MonthlyReportCustomers', 'Global Solutions', true, 3);
Configuration:
{
"report_partitioning_source": "lakehouse",
"lakehouse_table": "parameter_config",
"lakehouse_category": "MonthlyReportCustomers",
"lakehouse_column": "ParameterValue",
"lakehouse_filter": ""
}
See setup/create_lakehouse_parameter_table.sql for complete schema.
Provide parameters directly as JSON array.
Advantages:
Configuration:
{
"report_partitioning_source": "json",
"report_partitioning_values": "[\"Acme Corp\", \"TechStart Inc\", \"Global Solutions\"]"
}
Use Cases:
Query Warehouse tables using T-SQL.
Advantages:
Setup:
-- Run in Warehouse
CREATE TABLE dbo.ParameterConfig (
Category NVARCHAR(100),
ParameterValue NVARCHAR(500),
IsActive BIT,
SortOrder INT
);
-- Insert data
INSERT INTO dbo.ParameterConfig VALUES
('MonthlyReportCustomers', 'Acme Corp', 1, 1),
('MonthlyReportCustomers', 'TechStart Inc', 1, 2);
Configuration:
{
"report_partitioning_source": "warehouse",
"warehouse_name": "EnterpriseWarehouse",
"warehouse_table": "dbo.ParameterConfig",
"warehouse_column": "ParameterValue",
"warehouse_category": "MonthlyReportCustomers"
}
paginated_report_batch_executor.ipynbChoose one of the four options:
setup/sample_semantic_model_queries.dax)setup/create_lakehouse_parameter_table.sqlsetup/sample_data.sql for test dataSELECT * FROM parameter_configpaginated_report_batch_executor notebookScenario: Generate monthly sales report for each active customer using data from Power BI semantic model.
Pipeline Parameters:
{
"report_id": "12345678-1234-1234-1234-123456789abc",
"workspace_id": "workspace-guid",
"output_format": "PDF",
"static_params": "{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-12-31\"}",
"report_partitioning_column": "Customer",
"report_partitioning_source": "semantic_model",
"semantic_model_workspace_id": "workspace-guid",
"semantic_model_dataset_id": "dataset-guid",
"semantic_model_dax_query": "EVALUATE FILTER(DISTINCT('DimCustomer'[CustomerName]), 'DimCustomer'[IsActive] = TRUE)",
"archive_to_onelake": "true",
"max_retries": "3"
}
Result: PDF reports saved to OneLake at Files/reports/archive/2025/01/30/, one per customer.
Scenario: Generate quarterly report for each region using parameter list from Lakehouse.
Setup Lakehouse:
INSERT INTO parameter_config (Category, ParameterValue, IsActive, SortOrder)
VALUES
('QuarterlyRegions', 'North America', true, 1),
('QuarterlyRegions', 'Europe', true, 2),
('QuarterlyRegions', 'Asia Pacific', true, 3),
('QuarterlyRegions', 'Latin America', true, 4);
Pipeline Parameters:
{
"report_id": "report-guid",
"workspace_id": "workspace-guid",
"output_format": "XLSX",
"static_params": "{\"quarter\": \"Q1\", \"year\": \"2024\"}",
"report_partitioning_column": "Region",
"report_partitioning_source": "lakehouse",
"lakehouse_table": "parameter_config",
"lakehouse_category": "QuarterlyRegions",
"lakehouse_column": "ParameterValue"
}
Result: Excel reports for 4 regions saved to OneLake.
Scenario: Test report generation for 3 specific customers before rolling out to all.
Pipeline Parameters:
{
"report_id": "report-guid",
"workspace_id": "workspace-guid",
"output_format": "PDF",
"static_params": "{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-01-31\"}",
"report_partitioning_column": "Customer",
"report_partitioning_source": "json",
"report_partitioning_values": "[\"Test Customer A\", \"Test Customer B\", \"Test Customer C\"]"
}
Result: 3 test reports generated quickly without setting up database tables, saved to OneLake.
Scenario: Generate reports for customers visible to current user based on Row-Level Security.
Warehouse Setup:
-- Create table with RLS
CREATE TABLE dbo.CustomerReporting (
CustomerName NVARCHAR(200),
AssignedTo NVARCHAR(100),
IsActive BIT
);
-- Create RLS policy
CREATE FUNCTION dbo.fn_CustomerSecurityPredicate(@AssignedTo NVARCHAR(100))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS Result
WHERE @AssignedTo = USER_NAME() OR USER_NAME() IN (SELECT UserName FROM dbo.Admins);
CREATE SECURITY POLICY dbo.CustomerReportingPolicy
ADD FILTER PREDICATE dbo.fn_CustomerSecurityPredicate(AssignedTo)
ON dbo.CustomerReporting WITH (STATE = ON);
Pipeline Parameters:
{
"report_partitioning_source": "warehouse",
"warehouse_name": "EnterpriseWarehouse",
"warehouse_table": "dbo.CustomerReporting",
"warehouse_column": "CustomerName",
"warehouse_category": ""
}
Result: Each user only generates reports for their assigned customers (RLS enforced).
| Parameter | Description | Example |
|---|---|---|
workspace_id |
Fabric workspace GUID | "12345678-1234-..." |
report_id |
Paginated report GUID | "87654321-4321-..." |
output_format |
Export format | "PDF", "XLSX", "DOCX" |
static_params |
Fixed parameters (JSON) | "{\"start_date\": \"2024-01-01\"}" |
report_partitioning_column |
Parameter to loop through | "Customer" |
report_partitioning_source |
Parameter source type | "semantic_model", "lakehouse", "json", "warehouse" |
Semantic Model:
semantic_model_workspace_idsemantic_model_dataset_idsemantic_model_dax_queryLakehouse:
lakehouse_tablelakehouse_categorylakehouse_columnlakehouse_filter (optional)JSON:
report_partitioning_valuesWarehouse:
warehouse_namewarehouse_tablewarehouse_columnwarehouse_category| Parameter | Description | Default |
|---|---|---|
archive_to_onelake |
Save to OneLake | "true" |
max_retries |
Retry attempts per report | "3" |
export_timeout_seconds |
Max seconds to wait for export | "600" |
poll_interval_seconds |
Seconds between status polls | "5" |
For complex scenarios with multiple loop parameters:
Option A: Use Semantic Model with SUMMARIZECOLUMNS
EVALUATE
SUMMARIZECOLUMNS(
'DimGeography'[Region],
'DimProduct'[Category]
)
This returns all combinations of Region × Category. The notebook will loop through each row.
Option B: Pre-compute combinations in Lakehouse
INSERT INTO parameter_config (Category, ParameterValue, IsActive, SortOrder)
SELECT
'RegionCategoryCombo',
CONCAT(Region, '|', Category),
true,
ROW_NUMBER() OVER (ORDER BY Region, Category)
FROM region_category_combinations;
Then parse the combined value in your report.
Filter parameters based on date, status, or other criteria:
Semantic Model (DAX):
EVALUATE
FILTER(
DISTINCT('DimCustomer'[CustomerName]),
'DimCustomer'[IsActive] = TRUE
&& CALCULATE(SUM('FactSales'[Amount]), DATESINPERIOD('DimDate'[Date], TODAY(), -6, MONTH)) > 10000
)
Lakehouse (SQL):
{
"lakehouse_filter": "ValidFrom <= CURRENT_TIMESTAMP() AND (ValidTo IS NULL OR ValidTo >= CURRENT_TIMESTAMP())"
}
Files are automatically named with this pattern:
Report_{SpecialParamName}_{SanitizedValue}_{Timestamp}.{format}
Example:
Report_Customer_AcmeCorp_20250130_143022.pdf
Special characters are sanitized for filesystem compatibility.
Retry Logic:
Error Categories:
All errors are logged with detailed messages.
Console Logging: Real-time output shows:
Pipeline Notifications: Configure webhook URL for notifications:
Example Webhook Payload (Success):
{
"status": "success",
"pipelineRunId": "run-guid",
"successCount": 5,
"failCount": 0,
"totalSize": "12.5 MB",
"files": [
"/reports/2025/01/30/Report_Customer_AcmeCorp_20250130_143022.pdf",
"/reports/2025/01/30/Report_Customer_TechStart_20250130_143035.pdf"
]
}
Fine-tune the notebook’s performance and behavior for specific scenarios:
| Parameter | Default | Range | Description |
|---|---|---|---|
download_chunk_size_mb |
1 | 1-100 | Download chunk size in MB for large files |
file_size_warning_mb |
500 | 10-5000 | File size threshold for warnings |
connection_timeout_seconds |
30 | 5-300 | API connection timeout |
download_timeout_seconds |
120 | 30-600 | File download timeout |
param_loader_retry_attempts |
3 | 1-10 | Parameter loading retry count |
param_loader_retry_delay_seconds |
5 | 1-60 | Delay between parameter loading retries |
token_refresh_interval_minutes |
45 | 5-55 | Auto token refresh interval |
When to Adjust:
Large Files (Reports > 100MB):
{
"download_chunk_size_mb": "5",
"file_size_warning_mb": "1000",
"download_timeout_seconds": "300"
}
Slow Network Connections:
{
"connection_timeout_seconds": "60",
"download_timeout_seconds": "240"
}
Long-Running Batches (1000+ reports):
{
"token_refresh_interval_minutes": "30"
}
Note: Tokens expire after 60 minutes. Refresh interval should be less than 55 minutes.
Unreliable Parameter Source:
{
"param_loader_retry_attempts": "5",
"param_loader_retry_delay_seconds": "10"
}
Token Refresh Mechanism:
token_refresh_interval_minutesSymptoms: “No parameter values loaded! Check your configuration.”
Solutions:
Symptoms: 401 Unauthorized, token errors
Solutions:
Symptoms: “Export timeout after 600 seconds”
Solutions:
Symptoms: Reports take very long to generate
Solutions:
To run notebook interactively for debugging:
| Scenario | Recommended Source |
|---|---|
| Business users maintain list | Semantic Model or Lakehouse |
| Need Row-Level Security | Semantic Model or Warehouse |
| Large lists (1000+) | Lakehouse |
| Testing/Development | JSON |
| Complex SQL logic | Warehouse |
| Highest performance | Lakehouse |
OneLake Structure:
Files/
/reports/
/archive/
/{YYYY}/{MM}/{DD}/
Report_Customer_Value_Timestamp.pdf
Example:
Files/reports/archive/2025/01/30/Report_Customer_AcmeCorp_20250130_143022.pdf
config/ folder/
├── paginated_report_batch_executor.ipynb # Main notebook
├── README.md # This file
├── setup/
│ ├── create_lakehouse_parameter_table.sql
│ ├── sample_data.sql
│ └── sample_semantic_model_queries.dax
├── config/
│ ├── example_semantic_model.json
│ ├── example_lakehouse_mode.json
│ ├── example_json_mode.json
│ └── example_warehouse_mode.json
└── pipeline/
└── pipeline_definition.json
v1.0 (2025-11-03)
This project is provided as-is for use within Microsoft Fabric environments.
# Report configuration
workspace_id = ""
report_id = ""
output_format = "PDF"
static_params = "{}"
report_partitioning_column = "Customer"
# Source configuration
report_partitioning_source = "semantic_model" # or "lakehouse", "json", "warehouse"
# Semantic Model
semantic_model_workspace_id = ""
semantic_model_dataset_id = ""
semantic_model_dax_query = "EVALUATE DISTINCT('Table'[Column])"
# Lakehouse
lakehouse_table = "parameter_config"
lakehouse_category = "CustomerList"
lakehouse_column = "ParameterValue"
lakehouse_filter = ""
# JSON
report_partitioning_values = "[]"
# Warehouse
warehouse_name = ""
warehouse_table = "dbo.ParameterConfig"
warehouse_column = "ParameterValue"
warehouse_category = ""
# Options
archive_to_onelake = "true"
max_retries = "3"
export_timeout_seconds = "600"
poll_interval_seconds = "5"
{
"DailySchedule": "6:00 AM every day",
"WeeklySchedule": "8:00 AM every Monday",
"MonthlySchedule": "7:00 AM on 1st of month"
}
Ready to generate thousands of reports with ease! 🚀