You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When indexing the linux Git repository from scratch with annotation cache enabled, the 2nd phase of history used the CPU sub-optimally:
Observing the thread states, there are 20 blocked indexer threads (out of 48) with stack like this one:
"OpenGrok-index-worker-230" #230 prio=5 os_prio=64 cpu=3011328.50ms elapsed=17757.81s tid=0x0000000009f69000 nid=0x19e waiting for monitor entry [0x00007fef738f0000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.eclipse.jgit.internal.storage.file.WindowCache.getOrLoad(WindowCache.java:592)
- waiting to lock <0x00007fef8217d960> (a org.eclipse.jgit.internal.storage.file.WindowCache$Lock)
at org.eclipse.jgit.internal.storage.file.WindowCache.get(WindowCache.java:385)
at org.eclipse.jgit.internal.storage.file.WindowCursor.pin(WindowCursor.java:335)
at org.eclipse.jgit.internal.storage.file.WindowCursor.copy(WindowCursor.java:234)
at org.eclipse.jgit.internal.storage.file.Pack.readFully(Pack.java:602)
at org.eclipse.jgit.internal.storage.file.Pack.load(Pack.java:785)
at org.eclipse.jgit.internal.storage.file.Pack.get(Pack.java:273)
at org.eclipse.jgit.internal.storage.file.PackDirectory.open(PackDirectory.java:223)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:423)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:386)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:376)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:361)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:140)
at org.eclipse.jgit.treewalk.CanonicalTreeParser.reset(CanonicalTreeParser.java:191)
at org.eclipse.jgit.treewalk.TreeWalk.reset(TreeWalk.java:772)
at org.eclipse.jgit.revwalk.TreeRevFilter.include(TreeRevFilter.java:121)
at org.eclipse.jgit.revwalk.filter.AndRevFilter$Binary.include(AndRevFilter.java:104)
at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:108)
at org.eclipse.jgit.revwalk.RewriteGenerator.applyFilterToParents(RewriteGenerator.java:114)
at org.eclipse.jgit.revwalk.RewriteGenerator.next(RewriteGenerator.java:72)
at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:161)
at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:625)
at org.eclipse.jgit.revwalk.RevWalk.nextForIterator(RevWalk.java:1606)
at org.eclipse.jgit.revwalk.RevWalk.iterator(RevWalk.java:1630)
at org.opengrok.indexer.history.GitRepository.getFirstRevision(GitRepository.java:358)
at org.opengrok.indexer.history.GitRepository.annotate(GitRepository.java:334)
at org.opengrok.indexer.history.HistoryGuru.getAnnotationFromRepository(HistoryGuru.java:294)
at org.opengrok.indexer.history.HistoryGuru.createAnnotationCache(HistoryGuru.java:1189)
at org.opengrok.indexer.index.IndexDatabase.createAnnotationCache(IndexDatabase.java:1274)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1253)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$8(IndexDatabase.java:1887)
at org.opengrok.indexer.index.IndexDatabase$$Lambda$751/0x00007fef76134168.call(Unknown Source)
at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at java.lang.Thread.run([email protected]/Thread.java:834)
Another possibility is that the limitation is inherent to JGit. Exploring the implementation of org.eclipse.jgit.internal.storage.file.WindowCache might be in order, e.g. whether it could benefit from thread-local buffers.
This is basically the body of the function. The lock() function actually select a lock from an array of locks, using a hash function based on the pack and location. The length of the array is initialized like so:
By default, the 48 indexer threads will get to use 32 locks, which leaves the 20ish threads blocked. Thus, bumping the packedGitOpenFiles option should help.
Actually, the degradation was caused by lack of RAM in the system. The ZFS ARC got reduced to handful of percent and the system already started swapping. After bumping the RAM to 256GB this no longer happens:
There might still be some room for tuning given the number of indexing threads matches the number of CPUs in the system which matches the number of locks in JGit's WindowCache. Will retry with bigger number of threads (will have to bump the heap/RAM for this).
Uh oh!
There was an error while loading. Please reload this page.
When indexing the linux Git repository from scratch with annotation cache enabled, the 2nd phase of history used the CPU sub-optimally:
Observing the thread states, there are 20 blocked indexer threads (out of 48) with stack like this one:
The indexer is running with:
Perhaps JGit can be tuned:
Namely the
packedGitLimit
option which is 10 MiB by default and given the size of the heap this is disproportional.The text was updated successfully, but these errors were encountered: