Skip to content

java.lang.ArrayIndexOutOfBoundsException when initializing VnCoreNLP with "wseg" annotator #51

@colossalpen12

Description

@colossalpen12

I encountered an issue when using the VnCoreNLP wrapper in the py_vncorenlp package. The error occurs specifically when the annotators list contains "wseg". The initialization fails with a java.lang.ArrayIndexOutOfBoundsException.

Steps to Reproduce:

  1. Initialize a VnCoreNLP object with the annotators list containing "wseg":
    annotators = ["wseg"]
    model = VnCoreNLP(annotators=annotators)
  2. The error occurs during the instantiation:
    self.model = javaclass_vncorenlp(annotators)
    Resulting in the following error:
    jnius.JavaException: JVM exception occurred: 1 java.lang.ArrayIndexOutOfBoundsException
    

Expected Behavior:

The VnCoreNLP object should initialize without errors, regardless of whether "wseg" is in the annotators list.

Actual Behavior:

When "wseg" is included in the annotators list, the following exception is raised:

jnius.JavaException: JVM exception occurred: 1 java.lang.ArrayIndexOutOfBoundsException

Environment:

  • OS: macOS Sequoia 15.1.1
  • JDK Version: 1.8.0

Additional Context:

  • The issue only occurs when "wseg" is included in the annotators list.
  • Other annotators like "pos", "ner", and "parse" work as expected without throwing an error.
  • I’ve tried initializing the class with different configurations, and the error only happens with "wseg".
  • The main jar file and models folder size are the same as described in README.md

It seems like there might be an issue with how the wseg annotator is being handled internally within the Java code.

Possible Solutions:

  • Investigate the handling of the "wseg" annotator in the Java class vn.pipeline.VnCoreNLP and ensure the correct indexing or initialization logic.
  • Check if there are any known issues related to this annotator in the library.

Related Issues/PRs:

(None at the moment)
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions