tispark data type

# tispark read tidb table: tikv data type -> spark type
candidates(spark):
date, timestamp, decimal(x), string, float, double, binary(tidb bytes)(x)

- [x] doc: don't support read tidb types(decimal, ..., marked with x) to openmldb

tidb int: ull will be decimal(x), tiny1 will be bool, others will be long. @yht520100 gives the logic in tispark [TypeMapping.java](https://github.com/pingcap/tispark/blob/v3.1.5/core/src/main/java/com/pingcap/tikv/datatype/TypeMapping.java).
All int16/32/64 in tidb will be long, cuz tidb support unsigned mark https://docs.pingcap.com/zh/tidb/stable/data-type-numeric, but spark doesn't.

- [ ] see 'long problem' below

tidb time is just in 24h, no date, but it'll be long in spark df, openmldb read it as integer type? What about tidb datetime? 

- [ ] check (tidb datetime -> spark timestamp or other type)?
tidb datatime will be spark timestamp

## long problem

spark type -> openmldb type:
1. soft copy: although it's ok to read again `tidb://`, but when we load df to run sql, the type should be correct.
2. deep copy: parquet, and when we read again, we don't know it is from tidb, so the schema should be equals to openmldb table schema
3. online import: openmldb-spark-connector, internalrow get, needs test.

- [x] test online import mismatch schema hw
Online import can ignore schema check, it's ok to write long to any openmldb integer type. **Ensure that no cast overflow.**
write double to float is ok when no cast overflow. 
Thus online data import can skip schema check and convertion. But the columns size and name still should be checked.

- [ ] Anyway, we should correct the schema of df(`df=tispark.read(tidb)`), to support offline sql and save in offline storage. Add patch in `catalogLoad` @yht520100 

# tispark write openmldb to tidb: openmldb df -> tispark -> tidb
1. when df.schema!=tidb.schema, tispark will fail? openmldb schema ⊆ tidb schema, it will be fine? 
- [ ] check schema mismatch, e.g. spark int write to tidb long, spark long write to tidb int
    - [ ] check unsigned, e.g. spark long write to tidb uint

2. If int to tidb unsigned int fails, and user really want to write to mismatched tidb table, TODO


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tispark data type #3808

tispark read tidb table: tikv data type -> spark type

long problem

tispark write openmldb to tidb: openmldb df -> tispark -> tidb

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tispark data type #3808

Description

tispark read tidb table: tikv data type -> spark type

long problem

tispark write openmldb to tidb: openmldb df -> tispark -> tidb

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions