Skip to content

Commit 6ca774f

Browse files
paulmckrcuurezki
authored andcommitted
torture: Make kvm-remote.sh give up on unresponsive system
Currently, a system that stops responding at the wrong time will hang kvm-remote.sh. This can happen when the system in question is forced offline for maintenance, and there is currently no way for the user to kick this script into moving ahead. This commit therefore causes kvm-remote.sh to wait at most 15 minutes for a non-responsive system, that is, a system for which ssh gives an exit code of 255. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
1 parent 1806b1f commit 6ca774f

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

tools/testing/selftests/rcutorture/bin/kvm-remote.sh

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -181,10 +181,11 @@ done
181181

182182
# Function to check for presence of a file on the specified system.
183183
# Complain if the system cannot be reached, and retry after a wait.
184-
# Currently just waits forever if a machine disappears.
184+
# Currently just waits 15 minutes if a machine disappears.
185185
#
186186
# Usage: checkremotefile system pathname
187187
checkremotefile () {
188+
local nsshfails=0
188189
local ret
189190
local sleeptime=60
190191

@@ -195,6 +196,11 @@ checkremotefile () {
195196
if test "$ret" -eq 255
196197
then
197198
echo " ---" ssh failure to $1 checking for file $2, retry after $sleeptime seconds. `date` | tee -a "$oldrun/remote-log"
199+
nsshfails=$((nsshfails+1))
200+
if ((nsshfails > 15))
201+
then
202+
return 255
203+
fi
198204
elif test "$ret" -eq 0
199205
then
200206
return 0
@@ -268,12 +274,23 @@ echo All batches started. `date` | tee -a "$oldrun/remote-log"
268274
for i in $systems
269275
do
270276
echo " ---" Waiting for $i `date` | tee -a "$oldrun/remote-log"
271-
while checkremotefile "$i" "$resdir/$ds/remote.run"
277+
while :
272278
do
279+
checkremotefile "$i" "$resdir/$ds/remote.run"
280+
ret=$?
281+
if test "$ret" -eq 1
282+
then
283+
echo " ---" Collecting results from $i `date` | tee -a "$oldrun/remote-log"
284+
( cd "$oldrun"; ssh -o BatchMode=yes $i "cd $rundir; tar -czf - kvm-remote-*.sh.out */console.log */kvm-test-1-run*.sh.out */qemu[_-]pid */qemu-retval */qemu-affinity; rm -rf $T > /dev/null 2>&1" | tar -xzf - )
285+
break;
286+
fi
287+
if test "$ret" -eq 255
288+
then
289+
echo System $i persistent ssh failure, lost results `date` | tee -a "$oldrun/remote-log"
290+
break;
291+
fi
273292
sleep 30
274293
done
275-
echo " ---" Collecting results from $i `date` | tee -a "$oldrun/remote-log"
276-
( cd "$oldrun"; ssh -o BatchMode=yes $i "cd $rundir; tar -czf - kvm-remote-*.sh.out */console.log */kvm-test-1-run*.sh.out */qemu[_-]pid */qemu-retval */qemu-affinity; rm -rf $T > /dev/null 2>&1" | tar -xzf - )
277294
done
278295

279296
( kvm-end-run-stats.sh "$oldrun" "$starttime"; echo $? > $T/exitcode ) | tee -a "$oldrun/remote-log"

0 commit comments

Comments
 (0)