Write first, wrap elsewhere: the case of JPype

  • In the end of the day, the Python interpreter is C++ software

  • Most commonly, efficient Python code is written in C++ and then wrapped into Python

  • Put it simply, the JVM too is C++ software

  • Can Python code be written in Java and then wrapped into Python?

  • Apparently yes, via several ad-hoc bridging technologies:

    • JPype a bridge for calling JVM code from Python as a native library
    • Jython a JVM-based Python interpreter
    • Py4J an RPC based bridge for calling Java from Python
    • Pyjnius similar to JPype, but based on Cython
    • and many others…
  • Why exactly JPype?

    • still maintained, works with vanilla CPython, good documentation, good interoperabilty with JVM types

JPype Overview:

  1. Ensure you have a JVM installed on your system

    • with the JAVA_HOME environment variable set to the JVM installation directory
  2. Ensure your compiled Java code is available as a .jar file

    • say, in /path/to/my.jar
  3. Install the JPype package via pip install JPype1

  4. You first need JPype to start a JVM instance in your Python process

    • tby default, JVM location is inferred from the JAVA_HOME environment variable
    import jpype
    
    # start the JVM
    jpype.startJVM(classpath=["/path/to/my.jar"])
    
  5. Once the JVM is started, one can import Java classes and call their methods as if they were Python objects

    import jpype.imports # this is necessary to import Java classes
    
    from java.lang import System # import the java.lang.System class
    
    System.out.println("Hello World!")
    

JPype’s Bridging Model (pt. 1)

Overview on the official documentation

  • Java classes are presented wherever possible similar to Python classes

  • the only major difference is that Java classes and objects are closed and cannot be modified

Java classes vs. Python classes in JPype

JPype’s Bridging Model (pt. 2)

Overview on the official documentation

  • Java exceptions extend from Python exceptions

  • Java exceptions can be dealt with in the same way as Python native exceptions

    • i.e. via try-except blocks
  • JException serves as the base class for all Java exceptions

Java exceptions vs. Python exceptions in JPype

JPype’s Bridging Model (pt. 3)

Overview on the official documentation

  • most Python primitives directly map into Java primitives

  • however, Python does not have the same primitive types…

  • … hence, explicit casts may be needed in some cases

  • each primitive Java type is exposed in JPype (jpype.JBoolean, .JByte, .JChar, .JShort, .JInt, .JLong, .JFloat, .JDouble).

Java primitives vs. Python primitives in JPype

JPype’s Bridging Model (pt. 4)

Overview on the official documentation

  • Java strings are similar to Python strings

  • they are both immutable and produce a new string when altered

  • most operations can use Java strings in place of Python strings

    • with minor exceptions, as Python strings are not completely duck typed
  • when comparing or using strings as dictionary keys, all JString objects should be converted to Python

Java strings vs. Python strings in JPype

JPype’s Bridging Model (pt. 5)

Overview on the official documentation

  • Java arrays are mapped to Python lists

  • more precisely, they operate like Python lists, but they are fixed in size

  • reading a slice from a Java array returns a view of the array, not a copy

  • passing a slide of a Python list to Java will create a copy of the sub-list

Java arrays vs. Python lists in JPype

JPype’s Bridging Model (pt. 6)

Overview on the official documentation

  • Java collections are overloaded with Python syntax where possible

    • to operate similarly to Python collections
  • Java’s Iterables are mapped to Python iterables by overriding the __iter__ method

  • Java’s Collections are mapped to Python containers by overriding __len__

  • Java’s Maps support Python’s dictionaries syntax by overriding __getitem__ and __setitem__

  • Java’s Lists support Python’s lists syntax by overriding __getitem__ and __setitem__

Java collections vs. Python collections in JPype

JPype’s Bridging Model (pt. 7)

Overview on the official documentation

  • Java interfaces can be implemented in Python, via JPype’s decorators

  • Java’s open / abstract classes cannot be extended in Python

  • Python lambda expressions can be cast’d to Java’s functional interfaces

Implementing Java interfaces in Python

JPype type conversion model (pt. 1)

Overview on the official documentation

Explicit and implicit type conversions in JPype

JPype type conversion model (pt. 2)

Legend

  • none, there is no way to convert

  • explicit (E), JPype can convert the desired type, but only explicitly via casting

  • implicit (I), JPype will convert as needed

  • exact (X), like implicit, but takes priority in overload selection

Overload selection

  • Consider the following example of Python code with JPype:

    import jpype.imports
    from java.lang import System
    
    System.out.println(1)
    System.out.println(2.0)
    System.out.println('A')
    
  • Which overload of System.out.println is called among the many admissible ones?

    • Python’s 1 is convertible to Java’s int, long, and short
      • but Java’s int is the exact match
    • Python’s 2.0 is convertible to Java’s float and double
      • but Java’s double is the exact match
    • Python’s 'A' is convertible to Java’s String and char
      • but Java’s String is the exact match

Ambiguous overload selection

  • Consider the following example of Python code with JPype:

    import jpype
    
    csv = jpype.JPackage("io.github.gciatto.csv.Csv")
    
    csv.headerOf(["filed", "another field"])
    
  • This would raise the following error:

    TypeError: Ambiguous overloads found for io.github.gciatto.csv.Csv.headerOf(list) between:
        public static final io.github.gciatto.csv.Header io.github.gciatto.csv.Csv.headerOf(java.lang.Iterable)
        public static final io.github.gciatto.csv.Header io.github.gciatto.csv.Csv.headerOf(java.lang.String[])
    
    • because Python’s list is convertible to both Java’s Iterable and String[]
      • but neither is the exact match
  • To solve this issue, one can explicitly cast the Python list to the desired Java type:

    import jpype
    import jpype.imports
    from java.lang import Iterable as JIterable
    
    csv = jpype.JClass("io.github.gciatto.csv.Csv")
    
    csv.headerOf(JIterable@["field", "another field"])
    # returns Header("field", "another field")
    

Customising Java types in Python (pt. 1)

  • One may customise the behaviour of Java types in Python by providing custom implementations for them

    • by means of the @JImplementationFor decorator
  • In that case the special method __jclass_init__ is called on the custom implementation, just once, to configure the class

  • In type hierarchies, implementations provided for superclasses are inherited by subclasses

Customising Java types in Python (pt. 2)

Consider for instance the following customisations, allowing to use Java collections with Python syntax

from typing import Iterable, Sequence


@jpype.JImplementationFor("java.lang.Iterable")
class _JIterable:
    def __jclass_init__(self):
        Iterable.register(self) # makes this class a subtype of Iterable, to speed up isinstance checks 

    def __iter__(self):
        return self.iterator()


@jpype.JImplementationFor("java.util.Collection")
class _JCollection:
    def __len__(self):
        return self.size()      # supports "len(coll)" syntax

    def __delitem__(self, i):
        return self.remove(i)   # supports "del coll[i]" syntax

    def __contains__(self, i):
        return self.contains(i) # supports "i in coll" syntax

    # __iter__ is inherited from _JIterable
    # because in Java: Collection extends Iterable


@jpype.JImplementationFor('java.util.List')
class _JList(object):
    def __jclass_init__(self):
        Sequence.register(self) # makes this class a subtype of Sequence, to speed up isinstance checks

    def __getitem__(self, ndx):
        return self.get(ndx)   # supports "list[i]" syntax

    def append(self, obj):
        return self.add(obj)   # supports "list.append(obj)" syntax

    # __len__, __delitem__, __contains__, __iter__ are inherited from _JCollection

this is taken directly from JPype’s codebase

Making wrapped code Pythonic (pt. 1)

  • The code wrapped via JPype is not Pythonic by default

    • it works in principle, but it is very hard to use for the average Python developer
      • we cannot assume the average Python developer is familiar with Java
      • nor with JPype
      • and the developer should know both aspects to use the wrapped code as is
  • It is important to make the wrapped code as Pythonic as possible

    • factory methods for building instances of types
    • simplified package structure
      • e.g. io.github.gciatto.csv.Csv $\rightarrow$ jcsv.Csv
    • properties instead of getters and setters
    • snake_case instead of camelCase
    • magic methods implemented whenever possible
      • e.g. __len__ for java.util.Collection
      • e.g. __getitem__ for java.util.List
    • optional parameters in methods instead of overloads
  • All such refinements can be done in JPype via customisations of the Java types

    • unit tests should be written to ensure the customisations are not broken by future changes

Making wrapped code Pythonic (pt. 2)

Workflow

For all public types in the wrapped Java library:

  • decide their corresponding Python package
  • provide Pythonic factory methods
  • customise the Python class to make it Pythonic (possibly exploiting type hierarchies to save time)
    • add properties calling getters/setters
    • override Java methods to make them Pythonic
      • e.g. use magic methods where possible
      • e.g. use optional parameters where possible, removing the need for overloads
  • write unit tests for Pythonic API

Example: the jcsv package (pt. 1)

  • The jcsv package is a Pythonic wrapper for our JVM-based io.github.gciatto.csv library

  • Java’s type definition are brought to Python in jcsv/__init__.py:

    import jpype
    import jpype.imports
    from java.lang import Iterable as JIterable
    
    _csv = jpype.JPackage("io.github.gciatto.csv")
    
    Table = _csv.Table
    Row = _csv.Row
    Record = _csv.Record
    Header = _csv.Header
    Formatter = _csv.Formatter
    Parser = _csv.Parser
    Configuration = _csv.Configuration
    Csv = _csv.Csv
    CsvJvm = _csv.CsvJvm
    

    making it possible to write the following code on the user side:

    from jcsv import Table, Record, Header
    

Example: the jcsv package (pt. 2)

  • Parsing and formatting operations are mapped straightforwardly to Python functions:

    # jcsv/__init__.py
    
    def parse_csv_string(string, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
        return Csv.parseAsCSV(string, separator, delimiter, comment)
    
    
    def parse_csv_file(path, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
        return CsvJvm.parseCsvFile(str(path), separator, delimiter, comment)
    
    
    def format_as_csv(rows, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
        return Csv.formatAsCSV(JIterable@rows, separator, delimiter, comment)
    

Example: the jcsv package (pt. 3)

  • Ad-hoc factory method is provided for building Header instances:

    # jcsv/__init__.py
    from jcsv.python import iterable_or_varargs
    
    def header(*args):
        if len(args) == 1 and isinstance(args[0], int):
            return Csv.anonymousHeader(args[0])
        return iterable_or_varargs(args, lambda xs: Csv.headerOf(JIterable@map(str, xs)))
    

    making it possible to write the following code on the user side:

    import jcsv
    
    header1 = jcsv.header("column1", "column2", "column3") 
    header2 = jcsv.header(3) # anonymous header with 3 columns
    columns = (f"column{i}" for i in range(1, 4)) # generator expression
    header3 = jcsv.header(columns) # same as header1, but passing an interable
    
  • Function iterable_or_varargs aims at simulating multiple overloads:

    # jcsv/python.py
    from typing import Iterable
    
    def iterable_or_varargs(args, f):
        assert isinstance(args, Iterable)
        if len(args) == 1:
            item = args[0]
            if isinstance(item, Iterable):
                return f(item)
            else:
                return f([item])
        else:
            return f(args)
    

Example: the jcsv package (pt. 4)

  • Ad-hoc factory method is provided for building Record instances:

    # jcsv/__init__.py
    
    def record(header, *args):
        return iterable_or_varargs(args, lambda xs: Csv.recordOf(header, JIterable@map(str, xs)))
    
  • Ad-hoc factory method is provided for building Table instances:

    # jcsv/__init__.py
    
    def __ensure_header(h):
        return h if isinstance(h, Header) else header(h)
    def __ensure_record(r, h):
        return r if isinstance(r, Record) else record(h, r)
    
    def table(header, *args):
        header = __ensure_header(header)
        args = [__ensure_record(row, header) for row in args]
        return iterable_or_varargs(args, lambda xs: Csv.tableOf(header, JIterable@xs))
    

Example: the jcsv package (pt. 5)

  • The Row class is customised to make it more Pythonic:

    # jcsv/__init__.py
    
    @jpype.JImplementationFor("io.github.gciatto.csv.Row")
    class _Row:
        def __len__(self):
            return self.getSize()
    
        def __getitem__(self, item):
            if isinstance(item, int) and item < 0:
                item = len(self) + item
            try:
                return self.get(item)
            except _java.IndexOutOfBoundsException as e:
                raise IndexError(f"index {item} out of range") from e
    
        @property
        def size(self):
            return len(self)
    
    • supporting the syntax len(row) instead of row.getSize()
    • supporting the syntax row[i] instead of row.get(i)
    • supporting the syntax row[-i] instead of row.get(row.getSize() - i - 1)
    • letting IndexError be raised instead of IndexOutOfBoundsException
    • supporting the syntax row.size instead of row.getSize()

Example: the jcsv package (pt. 6)

  • The Header shall inherit all customisation for Row, plus the following ones:

    @jpype.JImplementationFor("io.github.gciatto.csv.Header")
    class _Header:
        @property
        def columns(self):
            return [str(c) for c in self.getColumns()]
    
        def __contains__(self, item):
            return self.contains(item)
    
        def index_of(self, column):
            return self.indexOf(column)
    
    • supporting the syntax header.columns instead of header.getColumns()
    • supporting the syntax column in header instead of header.contains(column)
    • supporting the syntax header.index_of(column) instead of header.indexOf(column)

Example: the jcsv package (pt. 7)

  • The Record shall inherit all customisation for Row, plus the following ones:

    @jpype.JImplementationFor("io.github.gciatto.csv.Record")
    class _Record:
        @property
        def header(self):
            return self.getHeader()
    
        @property
        def values(self):
            return [str(v) for v in self.getValues()]
    
        def __contains__(self, item):
            return self.contains(item)
    
    • supporting the syntax record.header instead of record.getHeader()
    • supporting the syntax record.values instead of record.getValues()
    • supporting the syntax value in record instead of record.contains(value)

Example: the jcsv package (pt. 8)

  • The Table class is customised too, to make it more Pythonic:

    @jpype.JImplementationFor("io.github.gciatto.csv.Table")
    class _Table:
        @property
        def header(self):
            return self.getHeader()
    
        def __len__(self):
            return self.getSize()
    
        def __getitem__(self, item):
            if isinstance(item, int) and item < 0:
                item = len(self) + item
            try:
                return self.get(item)
            except _java.IndexOutOfBoundsException as e:
                raise IndexError(f"index {item} out of range") from e
    
        @property
        def records(self):
            return self.getRecords()
    
        @property
        def size(self):
            return len(self)
    
    • supporting the syntax table.header instead of table.getHeader()
    • supporting the syntax len(table) instead of table.getSize()
    • supporting the syntax table[i] instead of table.get(i)
    • supporting the syntax table[-i] instead of table.get(table.getSize() - i - 1)
    • supporting the syntax record in table instead of table.contains(record)
    • supporting the syntax table.records instead of table.getRecords()

Including .jars in JPype projects (pt. 1)

csv-python/
├── build.gradle.kts            # this is where the generation of csv.jar is automated
├── jcsv
│   ├── __init__.py
│   ├── jvm
│   │   ├── __init__.py         # this is where JPype is loaded
│   │   └── csv.jar             # this the Fat-JAR of the JVM-based library
│   └── python.py
├── requirements.txt
└── test
    ├── __init__.py
    ├── test_parsing.py
    └── test_python_api.py
  1. We need to ensure that the JVM-based library is available on the system where jcsv is installed

    • why not including it in the Python package?
  2. The build.gradle.kts file automates the generation of the csv.jar file

    • it is a Fat-JAR containing all the dependencies of the JVM-based library
    • such JAR is placed in the jcsv/jvm directory
    • it is part of Python sources, so that it can be distributed with the Python library
  3. The jcsv/jvm/__init__.py file loads JPype and the csv.jar file

Including .jars in JPype projects (pt. 2)

  1. Snippet from the build.gradle.kts:

    tasks.create<Copy>("createCoreJar") {
        group = "Python"
        val shadowJar by project(":csv-core").tasks.getting(Jar::class)
        dependsOn(shadowJar)
        from(shadowJar.archiveFile) {
            rename(".*?\\.jar", "csv.jar")
        }
        into(projectDir.resolve("jcsv/jvm"))
    }
    
  2. Content of the jcsv/jvm/__init__.py file:

    import jpype
    from pathlib import Path
    
    # the directory where csv.jar is placed
    CLASSPATH = Path(__file__).parent
    
    # the list of all .jar files in CLASSPATH
    JARS = [str(j.resolve()) for j in CLASSPATH.glob('*.jar')]
    
    jpype.startJVM(classpath=JARS)
    
  3. Important line in jcsv/__init__.py:

    import jcsv.jvm
    

    this is forcing the startup of the JVM with the correct classpath whenever someone is using the jcsv module

Including JVM in JPype projects

  • We need to ensure that some JVM is available on the system where jcsv is installed

  • Notice that the JVM is available as a Python dependency too:

  • This means that the JVM can be automatically downloaded and installed via pip:

    pip install jdk4py
    
  • … or added as a dependency to the requirements.txt file:

    JPype1==1.4.1
    jdk4py==17.0.7.0
    
  • so, one may simply need to configure JPype to use that JVM:

    # jcsv/jvm/__init__.py
    import jpype, sys
    from jdk4py import JAVA_HOME
    
    def jvm_lib_file_names():
        if sys.platform == "win32":
            return {"jvm.dll"}
        elif sys.platform == "darwin":
            return {"libjli.dylib"}
        else:
            return {"libjvm.so"}
    
    
    def jvmlib(): 
        for name in __jvm_lib_file_names():
            for path in JAVA_HOME.glob(f"**/{name}"):
                if path.exists:
                    return str(path)
        return None
    
    jpype.startJVM(jvmpath=jvmlib())
    

About unit testing

  • Unit tests are essential to ensure the correctness of the Pythonic API

    • they prevent corruption of the Pythonic API when the JAR is updated
  • Consider for instance tests in:

    • test/test_parsing.py
    • test/test_python_api.py
  • It is important to test all the costumisations and factory methods

    • because these are not covered by the unit tests of the JVM-based library

Multi-platform Programming for Research-Oriented Software


Giovanni Ciatto — giovanni.ciatto@unibo.it


Compiled on: 2024-02-20

back