Skip to content

tispark data type #3808

Open
Open
@vagetablechicken

Description

@vagetablechicken

tispark read tidb table: tikv data type -> spark type

candidates(spark):
date, timestamp, decimal(x), string, float, double, binary(tidb bytes)(x)

  • doc: don't support read tidb types(decimal, ..., marked with x) to openmldb

tidb int: ull will be decimal(x), tiny1 will be bool, others will be long. @yht520100 gives the logic in tispark TypeMapping.java.
All int16/32/64 in tidb will be long, cuz tidb support unsigned mark https://docs.pingcap.com/zh/tidb/stable/data-type-numeric, but spark doesn't.

  • see 'long problem' below

tidb time is just in 24h, no date, but it'll be long in spark df, openmldb read it as integer type? What about tidb datetime?

  • check (tidb datetime -> spark timestamp or other type)?
    tidb datatime will be spark timestamp

long problem

spark type -> openmldb type:

  1. soft copy: although it's ok to read again tidb://, but when we load df to run sql, the type should be correct.
  2. deep copy: parquet, and when we read again, we don't know it is from tidb, so the schema should be equals to openmldb table schema
  3. online import: openmldb-spark-connector, internalrow get, needs test.
  • test online import mismatch schema hw
    Online import can ignore schema check, it's ok to write long to any openmldb integer type. Ensure that no cast overflow.
    write double to float is ok when no cast overflow.
    Thus online data import can skip schema check and convertion. But the columns size and name still should be checked.

  • Anyway, we should correct the schema of df(df=tispark.read(tidb)), to support offline sql and save in offline storage. Add patch in catalogLoad @yht520100

tispark write openmldb to tidb: openmldb df -> tispark -> tidb

  1. when df.schema!=tidb.schema, tispark will fail? openmldb schema ⊆ tidb schema, it will be fine?
  • check schema mismatch, e.g. spark int write to tidb long, spark long write to tidb int
    • check unsigned, e.g. spark long write to tidb uint
  1. If int to tidb unsigned int fails, and user really want to write to mismatched tidb table, TODO

Metadata

Metadata

Assignees

No one assigned

    Labels

    batch-engineopenmldb batch(offline) engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions