Skip to content

The continuous feature values in the gcformat sample data generated by the OpenMLDB SQL feature extraction script are incorrect #3922

@yht520100

Description

@yht520100

Bug Description
Service Version: 0.9.0
The gcformat sample data generated by the OpenMLDB SQL feature extraction script contains incorrect continuous feature values, all of which are set to 0.

Expected Behavior
Current incorrect format: label| slot:sign:origin-value
Correct format: label index| slot:sign:origin-value

Relation Case
OpenMLDB SQL Feature Extraction Example:

0| 1:0:1 2:4599670039981440374 3:6365000770384461703 4:0:93.200000
1| 1:0:2 2:5613161932270271752 3:-1384602352766124944 4:0:93.075000
0| 1:0:3 2:4599670039981440374 3:-6239076729344379818 4:0:92.893000

PICO Feature Extraction Example:

0 0| 2:-8773247204422130117:1 3:4042412524814531440 4:6048373541161169225 5:4681710344575317709:0x1.74ccccccccccdp6
1 1| 2:-8773247204422130117:2 3:6142047291687075953 4:1461111459061395210 5:4681710344575317709:0x1.744cccccccccdp6
0 2| 2:-8773247204422130117:3 3:4042412524814531440 4:3353218529862650678 5:4681710344575317709:0x1.73926e978d4fep6

Steps to Reproduce

  1. data schema:
id[Int],age[Int],job[String],cons_price_idx[Double],y[Int]
  1. PICO Feature Extraction Script:
target_y = binary_label(y)
f_id = continuous(id)
f_age = discrete(age)
f_job = discrete(job)
f_cons_price_idx = continuous(cons_price_idx)
  1. OpenMLDB SQL Feature Extraction Script:
select gcformat(
       binary_label(bool(y)),
       continuous(id),
       discrete(age),
       discrete(job),
       continuous(cons_price_idx)
) as instance from main_table

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions