Description
tispark read tidb table: tikv data type -> spark type
candidates(spark):
date, timestamp, decimal(x), string, float, double, binary(tidb bytes)(x)
- doc: don't support read tidb types(decimal, ..., marked with x) to openmldb
tidb int: ull will be decimal(x), tiny1 will be bool, others will be long. @yht520100 gives the logic in tispark TypeMapping.java.
All int16/32/64 in tidb will be long, cuz tidb support unsigned mark https://docs.pingcap.com/zh/tidb/stable/data-type-numeric, but spark doesn't.
- see 'long problem' below
tidb time is just in 24h, no date, but it'll be long in spark df, openmldb read it as integer type? What about tidb datetime?
- check (tidb datetime -> spark timestamp or other type)?
tidb datatime will be spark timestamp
long problem
spark type -> openmldb type:
- soft copy: although it's ok to read again
tidb://
, but when we load df to run sql, the type should be correct. - deep copy: parquet, and when we read again, we don't know it is from tidb, so the schema should be equals to openmldb table schema
- online import: openmldb-spark-connector, internalrow get, needs test.
-
test online import mismatch schema hw
Online import can ignore schema check, it's ok to write long to any openmldb integer type. Ensure that no cast overflow.
write double to float is ok when no cast overflow.
Thus online data import can skip schema check and convertion. But the columns size and name still should be checked. -
Anyway, we should correct the schema of df(
df=tispark.read(tidb)
), to support offline sql and save in offline storage. Add patch incatalogLoad
@yht520100
tispark write openmldb to tidb: openmldb df -> tispark -> tidb
- when df.schema!=tidb.schema, tispark will fail? openmldb schema ⊆ tidb schema, it will be fine?
- check schema mismatch, e.g. spark int write to tidb long, spark long write to tidb int
- check unsigned, e.g. spark long write to tidb uint
- If int to tidb unsigned int fails, and user really want to write to mismatched tidb table, TODO