@@ -59,3 +59,60 @@ Within the subdirectory `tpch` there are 22 examples that reproduce queries in
59
59
the TPC-H specification. These include realistic data that can be generated at
60
60
arbitrary scale and allow the user to see use cases for a variety of data frame
61
61
operations.
62
+
63
+ In the list below we describe which new operations can be found in the examples.
64
+ The queries are designed to be of increasing complexity, so it is recommended to
65
+ review them in order. For brevity, the following list does not include operations
66
+ found in previous examples.
67
+
68
+ - [ Convert CSV to Parquet] ( ./tpch/convert_data_to_parquet.py )
69
+ - Read from a CSV files where the delimiter is something other than a comma
70
+ - Specify schema during CVS reading
71
+ - Write to a parquet file
72
+ - [ Pricing Summary Report] ( ./tpch/q01_pricing_summary_report.py )
73
+ - Aggregation computing the maximum value, average, sum, and number of entries
74
+ - Filter data by date and interval
75
+ - Sorting
76
+ - [ Minimum Cost Supplier] ( ./tpch/q02_minimum_cost_supplier.py )
77
+ - Window operation to find minimum
78
+ - Sorting in descending order
79
+ - [ Shipping Priority] ( ./tpch/q03_shipping_priority.py )
80
+ - [ Order Priority Checking] ( ./tpch/q04_order_priority_checking.py )
81
+ - Aggregating multiple times in one data frame
82
+ - [ Local Supplier Volume] ( ./tpch/q05_local_supplier_volume.py )
83
+ - [ Forecasting Revenue Change] ( ./tpch/q06_forecasting_revenue_change.py )
84
+ - Using collect and extracting values as a python object
85
+ - [ Volume Shipping] ( ./tpch/q07_volume_shipping.py )
86
+ - Finding multiple distinct and mutually exclusive values within one dataframe
87
+ - Using ` case ` and ` when ` statements
88
+ - [ Market Share] ( ./tpch/q08_market_share.py )
89
+ - The operations in this query are similar to those in the prior examples, but
90
+ it is a more complex example of using filters, joins, and aggregates
91
+ - Using left outer joins
92
+ - [ Product Type Profit Measure] ( ./tpch/q09_product_type_profit_measure.py )
93
+ - Extract year from a date
94
+ - [ Returned Item Reporting] ( ./tpch/q10_returned_item_reporting.py )
95
+ - [ Important Stock Identification] ( ./tpch/q11_important_stock_identification.py )
96
+ - [ Shipping Modes and Order] ( ./tpch/q12_ship_mode_order_priority.py )
97
+ - Finding non-null values using a boolean operation in a filter
98
+ - Case statement with default value
99
+ - [ Customer Distribution] ( ./tpch/q13_customer_distribution.py )
100
+ - [ Promotion Effect] ( ./tpch/q14_promotion_effect.py )
101
+ - [ Top Supplier] ( ./tpch/q15_top_supplier.py )
102
+ - [ Parts/Supplier Relationship] ( ./tpch/q16_part_supplier_relationship.py )
103
+ - Using anti joins
104
+ - Using regular expressions (regex)
105
+ - Creating arrays of literal values
106
+ - Determine if an element exists within an array
107
+ - [ Small-Quantity-Order Revenue] ( ./tpch/q17_small_quantity_order.py )
108
+ - [ Large Volume Customer] ( ./tpch/q18_large_volume_customer.py )
109
+ - [ Discounted Revenue] ( ./tpch/q19_discounted_revenue.py )
110
+ - Creating a user defined function (UDF)
111
+ - Convert pyarrow Array to python values
112
+ - Filtering based on a UDF
113
+ - [ Potential Part Promotion] ( ./tpch/q20_potential_part_promotion.py )
114
+ - Extracting part of a string using substr
115
+ - [ Suppliers Who Kept Orders Waiting] ( ./tpch/q21_suppliers_kept_orders_waiting.py )
116
+ - Using array aggregation
117
+ - Determining the size of array elements
118
+ - [ Global Sales Opportunity] ( ./tpch/q22_global_sales_opportunity.py )
0 commit comments