Skip to content

The steprun feature not working as expected. #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
greenhal opened this issue Apr 24, 2025 · 2 comments · May be fixed by #787
Open

The steprun feature not working as expected. #782

greenhal opened this issue Apr 24, 2025 · 2 comments · May be fixed by #787
Assignees
Labels
bug Something isn't working

Comments

@greenhal
Copy link

greenhal commented Apr 24, 2025

Guidance
Bug reports are for when HammerDB is not behaving as expected.
Bug reports should not be submitted for help in understanding database performance related questions.
General questions on database performance or HammerDB usability should be submitted under Discussions.

Describe the bug
When using the steprun function to create a pyramid shape run, e.g. scale up users then scale down users in a stepped manor. The replicas ignore the duration setting and run until the primary finishes.

To Reproduce
Create a steps.xml file where each step has a start_after + duration time of less than the total duration.

<steps>
        <replica1>
                <start_after_prev>5</start_after_prev>
                <duration>1</duration>
                <virtual_users>5</virtual_users>
        </replica1>
        <replica2>
                <start_after_prev>8</start_after_prev>
                <duration>1</duration>
                <virtual_users>5</virtual_users>
        </replica2>
</steps>

Then execute the steprun:

dbset db pg
dbset bm TPC-C
diset tpcc pg_count_ware 100
diset tpcc pg_rampup 0
diset tpcc pg_duration 15
diset tpcc pg_allwarehouse true
diset tpcc pg_raiseerror true
diset tpcc pg_total_iterations 2147483648
diset tpcc pg_partition true
diset tpcc pg_timeprofile false
diset tpcc pg_driver timed
set timeprofile_option false
diset connection pg_host x.x.x.x
diset tpcc pg_superuser xxxxx
diset tpcc pg_superuserpass xxxxxx
diset tpcc pg_storedprocs true
loadscript
vuset vu 6 
steprun

The replicas will start the specified number of vu, at the specified delayed time, but they will run until the primary is finished.

Expected behavior
I expect the VUs running on the replicas to stop after the duration defined in the steps.xml file.

Screenshots
CLI only

HammerDB Version (please complete the following information):

  • Version: 4.6
  • Build: build from source

HammerDB Interface (please complete the following information):

  • UI: CLI

Operating System (please complete the following information):

  • Server OS: Amazon Linux 2
  • Client OS: Amazon Linux 2

Database Server (please complete the following information):

  • Database name: PostgreSQL
  • Database Release Version 16.4

Database Client (please complete the following information):

  • Database client name: postgres

Additional context

@sm-shaw sm-shaw self-assigned this Apr 25, 2025
@sm-shaw sm-shaw added the bug Something isn't working label Apr 25, 2025
@sm-shaw
Copy link
Contributor

sm-shaw commented Apr 25, 2025

OK, I can reproduce the issue with SQL Server on Windows as well on v5.0, so although it is sending the correct duration to the replicas they are overrunning as it looks like they are using the primary time. Will investigate further to provide a fix.

@sm-shaw
Copy link
Contributor

sm-shaw commented May 23, 2025

The cause of this issue was that the replicas were running in standard replica mode which meant their respective monitors would disconnect immediately and not complete the timings. The solution is to run the replicas for steprun as a special case in Local mode and then they will time and disconnect. Example below showing that the step effect is working and in the listing that the replicas complete and disconnect before the primary.

Image

Image

hammerdb>diset tpcc maria_duration 8
Changed tpcc:maria_duration from 1 to 8 for MariaDB

hammerdb>vuset vu 1

hammerdb>steprun
primary starts immediately, runs rampup for 0 minutes then runs test for 8 minutes with 1 Active VU
replica1 starts 1 minutes after rampup completes and runs test for 1 minutes with 1 Active VU
replica2 starts 1 minutes after previous replica starts and runs test for 1 minutes with 4 Active VU
replica3 starts 1 minutes after previous replica starts and runs test for 1 minutes with 1 Active VU
replica4 starts 1 minutes after previous replica starts and runs test for 1 minutes with 1 Active VU
Switch from Local
to Primary mode?
Enter yes or no: replied yes
Setting Primary Mode at id : 32060, hostname : ubuntu22
Primary Mode active at id : 32060, hostname : ubuntu22
Starting 1 replica HammerDB instance
Starting 2 replica HammerDB instance
Starting 3 replica HammerDB instance
Starting 4 replica HammerDB instance
HammerDB CLI v5.0
Copyright © HammerDB Ltd hosted by tpc.org 2019-2025
Type "help" for a list of commands
Doing wait to connect ....
HammerDB CLI v5.0
Copyright © HammerDB Ltd hosted by tpc.org 2019-2025
Type "help" for a list of commands
Primary waiting for all replicas to connect .... 0 out of 4 are connected
HammerDB CLI v5.0
Copyright © HammerDB Ltd hosted by tpc.org 2019-2025
Type "help" for a list of commands
HammerDB CLI v5.0
Copyright © HammerDB Ltd hosted by tpc.org 2019-2025
Type "help" for a list of commands
Initialized Jobs on-disk database /tmp/hammer.DB using existing tables (540,672 bytes)
Switch from Local
to Replica mode?
Enter yes or no: replied yes
Initialized Jobs on-disk database /tmp/hammer.DB using existing tables (540,672 bytes)
Switch from Local
to Replica mode?
Setting Replica Mode at id : 32115, hostname : ubuntu22
Enter yes or no: replied yes
Replica connecting to localhost 32060 : Connection succeeded
Received a new replica connection from host 127.0.0.1
Initialized Jobs on-disk database /tmp/hammer.DB using existing tables (540,672 bytes)
Switch from Local
to Replica mode?
Enter yes or no: replied yes
Setting Replica Mode at id : 32116, hostname : ubuntu22
Replica connecting to localhost 32060 : Connection succeeded
New replica joined : {32115 ubuntu22}
Initialized Jobs on-disk database /tmp/hammer.DB using existing tables (540,672 bytes)
Switch from Local
to Replica mode?
Setting Replica Mode at id : 32117, hostname : ubuntu22
Replica connecting to localhost 32060 : Connection succeeded
Enter yes or no: replied yes
Received a new replica connection from host 127.0.0.1
Primary call back successful
Switched to Replica mode via callback
Received a new replica connection from host 127.0.0.1
Setting Replica Mode at id : 32118, hostname : ubuntu22
Replica connecting to localhost 32060 : Connection succeeded
Received a new replica connection from host 127.0.0.1
New replica joined : {32115 ubuntu22} {32116 ubuntu22}
New replica joined : {32115 ubuntu22} {32116 ubuntu22} {32117 ubuntu22}
Primary call back successful
Switched to Replica mode via callback
New replica joined : {32115 ubuntu22} {32116 ubuntu22} {32117 ubuntu22} {32118 ubuntu22}
Primary call back successful
Switched to Replica mode via callback
Primary call back successful
Switched to Replica mode via callback
Primary waiting for all replicas to connect .... {32115 ubuntu22} {32116 ubuntu22} {32117 ubuntu22} {32118 ubuntu22} out of 4 are connected
Primary Received all replica connections {32115 ubuntu22} {32116 ubuntu22} {32117 ubuntu22} {32118 ubuntu22}
Database set to MariaDB
Database set to MariaDB
Database set to MariaDB
Database set to MariaDB
Database set to MariaDB
Setting primary to run 1 virtual users for 8 duration
Value 8 for tpcc:maria_duration is the same as existing value 8, no change made
Sending dbset all to 32115 ubuntu22
Setting replica1 to start after 1 duration 1 VU count 1, Replica instance is 32115 ubuntu22
Sending "set opmode Local" to 32115 ubuntu22
Sending "diset tpcc maria_timeprofile false" to 32115 ubuntu22
Value false for tpcc:maria_timeprofile is the same as existing value false, no change made
Sending "diset tpcc maria_rampup 0" to 32115 ubuntu22
Value 0 for tpcc:maria_rampup is the same as existing value 0, no change made
Sending "diset tpcc maria_duration 1" to 32115 ubuntu22
Changed tpcc:maria_duration from 8 to 1 for MariaDB
Sending "vuset vu 1" to 32115 ubuntu22
Sending dbset all to 32116 ubuntu22
Setting replica2 to start after 1 duration 1 VU count 4, Replica instance is 32116 ubuntu22
Sending "set opmode Local" to 32116 ubuntu22
Sending "diset tpcc maria_timeprofile false" to 32116 ubuntu22
Value false for tpcc:maria_timeprofile is the same as existing value false, no change made
Sending "diset tpcc maria_rampup 0" to 32116 ubuntu22
Value 0 for tpcc:maria_rampup is the same as existing value 0, no change made
Sending "diset tpcc maria_duration 1" to 32116 ubuntu22
Changed tpcc:maria_duration from 8 to 1 for MariaDB
Sending "vuset vu 4" to 32116 ubuntu22
Sending dbset all to 32117 ubuntu22
Setting replica3 to start after 1 duration 1 VU count 1, Replica instance is 32117 ubuntu22
Sending "set opmode Local" to 32117 ubuntu22
Sending "diset tpcc maria_timeprofile false" to 32117 ubuntu22
Value false for tpcc:maria_timeprofile is the same as existing value false, no change made
Sending "diset tpcc maria_rampup 0" to 32117 ubuntu22
Value 0 for tpcc:maria_rampup is the same as existing value 0, no change made
Sending "diset tpcc maria_duration 1" to 32117 ubuntu22
Changed tpcc:maria_duration from 8 to 1 for MariaDB
Sending "vuset vu 1" to 32117 ubuntu22
Sending dbset all to 32118 ubuntu22
Setting replica4 to start after 1 duration 1 VU count 1, Replica instance is 32118 ubuntu22
Sending "set opmode Local" to 32118 ubuntu22
Sending "diset tpcc maria_timeprofile false" to 32118 ubuntu22
Value false for tpcc:maria_timeprofile is the same as existing value false, no change made
Sending "diset tpcc maria_rampup 0" to 32118 ubuntu22
Value 0 for tpcc:maria_rampup is the same as existing value 0, no change made
Sending "diset tpcc maria_duration 1" to 32118 ubuntu22
Changed tpcc:maria_duration from 8 to 1 for MariaDB
Sending "vuset vu 1" to 32118 ubuntu22
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
2 Virtual Users Created with Monitor VU
Starting Primary VUs
Vuser 1:RUNNING
Vuser 1:Ssl_cipher 
Vuser 1:DBVersion:11.8.1
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 8 in minutes
Vuser 1 created MONITOR - WAIT IDLE
Vuser 1 created MONITOR - WAIT IDLE
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 2 created - WAIT IDLE
2 Virtual Users Created with Monitor VU
2 Virtual Users Created with Monitor VU
2 Virtual Users Created with Monitor VU
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
Vuser 4 created - WAIT IDLE
Vuser 5 created - WAIT IDLE
5 Virtual Users Created with Monitor VU
Vuser 2:RUNNING
Vuser 2:Ssl_cipher 
Vuser 2:Processing 10000000 transactions with output suppressed...
Delaying Start of Replicas to rampup 0 replica1 1 replica2 1 replica3 1 replica4 1
Delaying replica1 for 1 minutes.
Delaying replica2 for 2 minutes.
Delaying replica3 for 3 minutes.
Delaying replica4 for 4 minutes.
Primary entering loop waiting for vucomplete
Vuser 1:1 ...,
Sending "run_virtual" to 32115 ubuntu22
Vuser 1:RUNNING
Vuser 1:Ssl_cipher 
Vuser 1:DBVersion:11.8.1
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 2:RUNNING
Vuser 2:Ssl_cipher 
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 1:2 ...,
Sending "run_virtual" to 32116 ubuntu22
Vuser 1:RUNNING
Vuser 1:Ssl_cipher 
Vuser 1:DBVersion:11.8.1
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 104348 NOPM from 242415 MariaDB TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:RUNNING
Vuser 2:Ssl_cipher 
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Ssl_cipher 
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 4:RUNNING
Vuser 4:Ssl_cipher 
Vuser 4:Processing 10000000 transactions with output suppressed...
Vuser 5:RUNNING
Vuser 5:Ssl_cipher 
Vuser 5:Processing 10000000 transactions with output suppressed...
Replica workload complete and calling exit from primary
Lost connection to : 32115 ubuntu22 because target application died or connection lost
Vuser 1:3 ...,
Sending "run_virtual" to 32117 ubuntu22
Vuser 1:RUNNING
Vuser 1:Ssl_cipher 
Vuser 1:DBVersion:11.8.1
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:4 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 239619 NOPM from 556451 MariaDB TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
Vuser 4:FINISHED SUCCESS
Vuser 3:FINISHED SUCCESS
Vuser 5:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:RUNNING
Vuser 2:Ssl_cipher 
Vuser 2:Processing 10000000 transactions with output suppressed...
Replica workload complete and calling exit from primary
Lost connection to : 32116 ubuntu22 because target application died or connection lost
Vuser 1:4 ...,
Sending "run_virtual" to 32118 ubuntu22
Vuser 1:RUNNING
Vuser 1:Ssl_cipher 
Vuser 1:DBVersion:11.8.1
Vuser 1:Beginning rampup time of 0 minutes
Vuser 1:Rampup complete, Taking start Transaction Count.
Vuser 1:Timing test period of 1 in minutes
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 105003 NOPM from 244411 MariaDB TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:RUNNING
Vuser 2:Ssl_cipher 
Vuser 2:Processing 10000000 transactions with output suppressed...
Replica workload complete and calling exit from primary
Lost connection to : 32117 ubuntu22 because target application died or connection lost
Vuser 1:5 ...,
Vuser 1:1 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 104604 NOPM from 242562 MariaDB TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Replica workload complete and calling exit from primary
Lost connection to : 32118 ubuntu22 because target application died or connection lost
Vuser 1:6 ...,
Vuser 1:7 ...,
Vuser 1:8 ...,
Vuser 1:Test complete, Taking end Transaction Count.
Vuser 1:1 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 96747 NOPM from 224827 MariaDB TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Primary complete
deleting port_file /tmp/hdbcallback.tcl
Step workload complete
vudestroy success

hammerdb>

@sm-shaw sm-shaw linked a pull request May 23, 2025 that will close this issue
@sm-shaw sm-shaw linked a pull request May 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants