Skip to content

Commit 36cc531

Browse files
committed
Expand on readme to link to examples within tpch folder
1 parent 0b90a66 commit 36cc531

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

examples/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,60 @@ Within the subdirectory `tpch` there are 22 examples that reproduce queries in
5959
the TPC-H specification. These include realistic data that can be generated at
6060
arbitrary scale and allow the user to see use cases for a variety of data frame
6161
operations.
62+
63+
In the list below we describe which new operations can be found in the examples.
64+
The queries are designed to be of increasing complexity, so it is recommended to
65+
review them in order. For brevity, the following list does not include operations
66+
found in previous examples.
67+
68+
- [Convert CSV to Parquet](./tpch/convert_data_to_parquet.py)
69+
- Read from a CSV files where the delimiter is something other than a comma
70+
- Specify schema during CVS reading
71+
- Write to a parquet file
72+
- [Pricing Summary Report](./tpch/q01_pricing_summary_report.py)
73+
- Aggregation computing the maximum value, average, sum, and number of entries
74+
- Filter data by date and interval
75+
- Sorting
76+
- [Minimum Cost Supplier](./tpch/q02_minimum_cost_supplier.py)
77+
- Window operation to find minimum
78+
- Sorting in descending order
79+
- [Shipping Priority](./tpch/q03_shipping_priority.py)
80+
- [Order Priority Checking](./tpch/q04_order_priority_checking.py)
81+
- Aggregating multiple times in one data frame
82+
- [Local Supplier Volume](./tpch/q05_local_supplier_volume.py)
83+
- [Forecasting Revenue Change](./tpch/q06_forecasting_revenue_change.py)
84+
- Using collect and extracting values as a python object
85+
- [Volume Shipping](./tpch/q07_volume_shipping.py)
86+
- Finding multiple distinct and mutually exclusive values within one dataframe
87+
- Using `case` and `when` statements
88+
- [Market Share](./tpch/q08_market_share.py)
89+
- The operations in this query are similar to those in the prior examples, but
90+
it is a more complex example of using filters, joins, and aggregates
91+
- Using left outer joins
92+
- [Product Type Profit Measure](./tpch/q09_product_type_profit_measure.py)
93+
- Extract year from a date
94+
- [Returned Item Reporting](./tpch/q10_returned_item_reporting.py)
95+
- [Important Stock Identification](./tpch/q11_important_stock_identification.py)
96+
- [Shipping Modes and Order](./tpch/q12_ship_mode_order_priority.py)
97+
- Finding non-null values using a boolean operation in a filter
98+
- Case statement with default value
99+
- [Customer Distribution](./tpch/q13_customer_distribution.py)
100+
- [Promotion Effect](./tpch/q14_promotion_effect.py)
101+
- [Top Supplier](./tpch/q15_top_supplier.py)
102+
- [Parts/Supplier Relationship](./tpch/q16_part_supplier_relationship.py)
103+
- Using anti joins
104+
- Using regular expressions (regex)
105+
- Creating arrays of literal values
106+
- Determine if an element exists within an array
107+
- [Small-Quantity-Order Revenue](./tpch/q17_small_quantity_order.py)
108+
- [Large Volume Customer](./tpch/q18_large_volume_customer.py)
109+
- [Discounted Revenue](./tpch/q19_discounted_revenue.py)
110+
- Creating a user defined function (UDF)
111+
- Convert pyarrow Array to python values
112+
- Filtering based on a UDF
113+
- [Potential Part Promotion](./tpch/q20_potential_part_promotion.py)
114+
- Extracting part of a string using substr
115+
- [Suppliers Who Kept Orders Waiting](./tpch/q21_suppliers_kept_orders_waiting.py)
116+
- Using array aggregation
117+
- Determining the size of array elements
118+
- [Global Sales Opportunity](./tpch/q22_global_sales_opportunity.py)

0 commit comments

Comments
 (0)