Write first, wrap elsewhere: the case of JPype

In the end of the day, the Python interpreter is C++ software
Most commonly, efficient Python code is written in C++ and then wrapped into Python
Put it simply, the JVM too is C++ software
Can Python code be written in Java and then wrapped into Python?
Apparently yes, via several ad-hoc bridging technologies:
- JPype a bridge for calling JVM code from Python as a native library
- Jython a JVM-based Python interpreter
- Py4J an RPC based bridge for calling Java from Python
- Pyjnius similar to JPype, but based on Cython
- and many others…
Why exactly JPype?
- still maintained, works with vanilla CPython, good documentation, good interoperabilty with JVM types

JPype Overview:

Ensure you have a JVM installed on your system
- with the JAVA_HOME environment variable set to the JVM installation directory
Ensure your compiled Java code is available as a .jar file
- say, in /path/to/my.jar
Install the JPype package via pip install JPype1
You first need JPype to start a JVM instance in your Python process
- tby default, JVM location is inferred from the JAVA_HOME environment variable
```
import jpype

# start the JVM
jpype.startJVM(classpath=["/path/to/my.jar"])
```

Once the JVM is started, one can import Java classes and call their methods as if they were Python objects

import jpype.imports # this is necessary to import Java classes

from java.lang import System # import the java.lang.System class

System.out.println("Hello World!")

JPype’s Bridging Model (pt. 1)

Overview on the official documentation

Java classes are presented wherever possible similar to Python classes
the only major difference is that Java classes and objects are closed and cannot be modified

Java classes vs. Python classes in JPype

JPype’s Bridging Model (pt. 2)

Overview on the official documentation

Java exceptions extend from Python exceptions
Java exceptions can be dealt with in the same way as Python native exceptions
- i.e. via try-except blocks
JException serves as the base class for all Java exceptions

Java exceptions vs. Python exceptions in JPype

JPype’s Bridging Model (pt. 3)

Overview on the official documentation

most Python primitives directly map into Java primitives
however, Python does not have the same primitive types…
… hence, explicit casts may be needed in some cases
each primitive Java type is exposed in JPype (jpype.JBoolean, .JByte, .JChar, .JShort, .JInt, .JLong, .JFloat, .JDouble).

Java primitives vs. Python primitives in JPype

JPype’s Bridging Model (pt. 4)

Overview on the official documentation

Java strings are similar to Python strings
they are both immutable and produce a new string when altered
most operations can use Java strings in place of Python strings
- with minor exceptions, as Python strings are not completely duck typed
when comparing or using strings as dictionary keys, all JString objects should be converted to Python

Java strings vs. Python strings in JPype

JPype’s Bridging Model (pt. 5)

Overview on the official documentation

Java arrays are mapped to Python lists
more precisely, they operate like Python lists, but they are fixed in size
reading a slice from a Java array returns a view of the array, not a copy
passing a slide of a Python list to Java will create a copy of the sub-list

JPype’s Bridging Model (pt. 6)

Overview on the official documentation

Java collections are overloaded with Python syntax where possible
- to operate similarly to Python collections
Java’s Iterables are mapped to Python iterables by overriding the __iter__ method
Java’s Collections are mapped to Python containers by overriding __len__
Java’s Maps support Python’s dictionaries syntax by overriding __getitem__ and __setitem__
Java’s Lists support Python’s lists syntax by overriding __getitem__ and __setitem__

Java collections vs. Python collections in JPype

JPype’s Bridging Model (pt. 7)

Overview on the official documentation

Java interfaces can be implemented in Python, via JPype’s decorators
Java’s open / abstract classes cannot be extended in Python
Python lambda expressions can be cast’d to Java’s functional interfaces

JPype type conversion model (pt. 1)

Overview on the official documentation

Explicit and implicit type conversions in JPype

JPype type conversion model (pt. 2)

Legend

none, there is no way to convert
explicit (E), JPype can convert the desired type, but only explicitly via casting
implicit (I), JPype will convert as needed
exact (X), like implicit, but takes priority in overload selection

Overload selection

Consider the following example of Python code with JPype:

import jpype.imports
from java.lang import System

System.out.println(1)
System.out.println(2.0)
System.out.println('A')

Which overload of System.out.println is called among the many admissible ones?
- Python’s 1 is convertible to Java’s int, long, and short
  - but Java’s int is the exact match
- Python’s 2.0 is convertible to Java’s float and double
  - but Java’s double is the exact match
- Python’s 'A' is convertible to Java’s String and char
  - but Java’s String is the exact match

Ambiguous overload selection

Consider the following example of Python code with JPype:

import jpype

csv = jpype.JPackage("io.github.gciatto.csv.Csv")

csv.headerOf(["filed", "another field"])

This would raise the following error:

TypeError: Ambiguous overloads found for io.github.gciatto.csv.Csv.headerOf(list) between:
    public static final io.github.gciatto.csv.Header io.github.gciatto.csv.Csv.headerOf(java.lang.Iterable)
    public static final io.github.gciatto.csv.Header io.github.gciatto.csv.Csv.headerOf(java.lang.String[])

because Python’s list is convertible to both Java’s Iterable and String[]
- but neither is the exact match

To solve this issue, one can explicitly cast the Python list to the desired Java type:

import jpype
import jpype.imports
from java.lang import Iterable as JIterable

csv = jpype.JClass("io.github.gciatto.csv.Csv")

csv.headerOf(JIterable@["field", "another field"])
# returns Header("field", "another field")

Customising Java types in Python (pt. 1)

One may customise the behaviour of Java types in Python by providing custom implementations for them
- by means of the @JImplementationFor decorator
In that case the special method __jclass_init__ is called on the custom implementation, just once, to configure the class
In type hierarchies, implementations provided for superclasses are inherited by subclasses

Customising Java types in Python (pt. 2)

Consider for instance the following customisations, allowing to use Java collections with Python syntax

from typing import Iterable, Sequence


@jpype.JImplementationFor("java.lang.Iterable")
class _JIterable:
    def __jclass_init__(self):
        Iterable.register(self) # makes this class a subtype of Iterable, to speed up isinstance checks 

    def __iter__(self):
        return self.iterator()


@jpype.JImplementationFor("java.util.Collection")
class _JCollection:
    def __len__(self):
        return self.size()      # supports "len(coll)" syntax

    def __delitem__(self, i):
        return self.remove(i)   # supports "del coll[i]" syntax

    def __contains__(self, i):
        return self.contains(i) # supports "i in coll" syntax

    # __iter__ is inherited from _JIterable
    # because in Java: Collection extends Iterable


@jpype.JImplementationFor('java.util.List')
class _JList(object):
    def __jclass_init__(self):
        Sequence.register(self) # makes this class a subtype of Sequence, to speed up isinstance checks

    def __getitem__(self, ndx):
        return self.get(ndx)   # supports "list[i]" syntax

    def append(self, obj):
        return self.add(obj)   # supports "list.append(obj)" syntax

    # __len__, __delitem__, __contains__, __iter__ are inherited from _JCollection

this is taken directly from JPype’s codebase

Making wrapped code Pythonic (pt. 1)

The code wrapped via JPype is not Pythonic by default
- it works in principle, but it is very hard to use for the average Python developer
  - we cannot assume the average Python developer is familiar with Java…
  - nor with JPype
  - and the developer should know both aspects to use the wrapped code as is
It is important to make the wrapped code as Pythonic as possible
- factory methods for building instances of types
- simplified package structure
  - e.g. io.github.gciatto.csv.Csv $\rightarrow$ jcsv.Csv
- properties instead of getters and setters
- snake_case instead of camelCase
- magic methods implemented whenever possible
  - e.g. __len__ for java.util.Collection
  - e.g. __getitem__ for java.util.List
- optional parameters in methods instead of overloads
All such refinements can be done in JPype via customisations of the Java types
- unit tests should be written to ensure the customisations are not broken by future changes

Making wrapped code Pythonic (pt. 2)

Workflow

For all public types in the wrapped Java library:

decide their corresponding Python package
provide Pythonic factory methods
customise the Python class to make it Pythonic (possibly exploiting type hierarchies to save time)
- add properties calling getters/setters
- override Java methods to make them Pythonic
  - e.g. use magic methods where possible
  - e.g. use optional parameters where possible, removing the need for overloads
write unit tests for Pythonic API

Example: the `jcsv` package (pt. 1)

The jcsv package is a Pythonic wrapper for our JVM-based io.github.gciatto.csv library

Java’s type definition are brought to Python in jcsv/__init__.py:

import jpype
import jpype.imports
from java.lang import Iterable as JIterable

_csv = jpype.JPackage("io.github.gciatto.csv")

Table = _csv.Table
Row = _csv.Row
Record = _csv.Record
Header = _csv.Header
Formatter = _csv.Formatter
Parser = _csv.Parser
Configuration = _csv.Configuration
Csv = _csv.Csv
CsvJvm = _csv.CsvJvm

making it possible to write the following code on the user side:

from jcsv import Table, Record, Header

Example: the `jcsv` package (pt. 2)

Parsing and formatting operations are mapped straightforwardly to Python functions:

# jcsv/__init__.py

def parse_csv_string(string, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
    return Csv.parseAsCSV(string, separator, delimiter, comment)


def parse_csv_file(path, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
    return CsvJvm.parseCsvFile(str(path), separator, delimiter, comment)


def format_as_csv(rows, separator = Csv.DEFAULT_SEPARATOR, delimiter = Csv.DEFAULT_DELIMITER, comment = Csv.DEFAULT_COMMENT):
    return Csv.formatAsCSV(JIterable@rows, separator, delimiter, comment)

Example: the `jcsv` package (pt. 3)

Ad-hoc factory method is provided for building Header instances:

# jcsv/__init__.py
from jcsv.python import iterable_or_varargs

def header(*args):
    if len(args) == 1 and isinstance(args[0], int):
        return Csv.anonymousHeader(args[0])
    return iterable_or_varargs(args, lambda xs: Csv.headerOf(JIterable@map(str, xs)))

making it possible to write the following code on the user side:

import jcsv

header1 = jcsv.header("column1", "column2", "column3") 
header2 = jcsv.header(3) # anonymous header with 3 columns
columns = (f"column{i}" for i in range(1, 4)) # generator expression
header3 = jcsv.header(columns) # same as header1, but passing an interable

Function iterable_or_varargs aims at simulating multiple overloads:

# jcsv/python.py
from typing import Iterable

def iterable_or_varargs(args, f):
    assert isinstance(args, Iterable)
    if len(args) == 1:
        item = args[0]
        if isinstance(item, Iterable):
            return f(item)
        else:
            return f([item])
    else:
        return f(args)

Example: the `jcsv` package (pt. 4)

Ad-hoc factory method is provided for building Record instances:

# jcsv/__init__.py

def record(header, *args):
    return iterable_or_varargs(args, lambda xs: Csv.recordOf(header, JIterable@map(str, xs)))

Ad-hoc factory method is provided for building Table instances:

# jcsv/__init__.py

def __ensure_header(h):
    return h if isinstance(h, Header) else header(h)
def __ensure_record(r, h):
    return r if isinstance(r, Record) else record(h, r)

def table(header, *args):
    header = __ensure_header(header)
    args = [__ensure_record(row, header) for row in args]
    return iterable_or_varargs(args, lambda xs: Csv.tableOf(header, JIterable@xs))

Example: the `jcsv` package (pt. 5)

The Row class is customised to make it more Pythonic:

# jcsv/__init__.py

@jpype.JImplementationFor("io.github.gciatto.csv.Row")
class _Row:
    def __len__(self):
        return self.getSize()

    def __getitem__(self, item):
        if isinstance(item, int) and item < 0:
            item = len(self) + item
        try:
            return self.get(item)
        except _java.IndexOutOfBoundsException as e:
            raise IndexError(f"index {item} out of range") from e

    @property
    def size(self):
        return len(self)

supporting the syntax len(row) instead of row.getSize()
supporting the syntax row[i] instead of row.get(i)
supporting the syntax row[-i] instead of row.get(row.getSize() - i - 1)
letting IndexError be raised instead of IndexOutOfBoundsException
supporting the syntax row.size instead of row.getSize()

Example: the `jcsv` package (pt. 6)

The Header shall inherit all customisation for Row, plus the following ones:

@jpype.JImplementationFor("io.github.gciatto.csv.Header")
class _Header:
    @property
    def columns(self):
        return [str(c) for c in self.getColumns()]

    def __contains__(self, item):
        return self.contains(item)

    def index_of(self, column):
        return self.indexOf(column)

supporting the syntax header.columns instead of header.getColumns()
supporting the syntax column in header instead of header.contains(column)
supporting the syntax header.index_of(column) instead of header.indexOf(column)

Example: the `jcsv` package (pt. 7)

The Record shall inherit all customisation for Row, plus the following ones:

@jpype.JImplementationFor("io.github.gciatto.csv.Record")
class _Record:
    @property
    def header(self):
        return self.getHeader()

    @property
    def values(self):
        return [str(v) for v in self.getValues()]

    def __contains__(self, item):
        return self.contains(item)

supporting the syntax record.header instead of record.getHeader()
supporting the syntax record.values instead of record.getValues()
supporting the syntax value in record instead of record.contains(value)

Example: the `jcsv` package (pt. 8)

The Table class is customised too, to make it more Pythonic:

@jpype.JImplementationFor("io.github.gciatto.csv.Table")
class _Table:
    @property
    def header(self):
        return self.getHeader()

    def __len__(self):
        return self.getSize()

    def __getitem__(self, item):
        if isinstance(item, int) and item < 0:
            item = len(self) + item
        try:
            return self.get(item)
        except _java.IndexOutOfBoundsException as e:
            raise IndexError(f"index {item} out of range") from e

    @property
    def records(self):
        return self.getRecords()

    @property
    def size(self):
        return len(self)

supporting the syntax table.header instead of table.getHeader()
supporting the syntax len(table) instead of table.getSize()
supporting the syntax table[i] instead of table.get(i)
supporting the syntax table[-i] instead of table.get(table.getSize() - i - 1)
supporting the syntax record in table instead of table.contains(record)
supporting the syntax table.records instead of table.getRecords()

Including `.jar`s in JPype projects (pt. 1)

csv-python/
├── build.gradle.kts            # this is where the generation of csv.jar is automated
├── jcsv
│   ├── __init__.py
│   ├── jvm
│   │   ├── __init__.py         # this is where JPype is loaded
│   │   └── csv.jar             # this the Fat-JAR of the JVM-based library
│   └── python.py
├── requirements.txt
└── test
    ├── __init__.py
    ├── test_parsing.py
    └── test_python_api.py

We need to ensure that the JVM-based library is available on the system where jcsv is installed
- why not including it in the Python package?
The build.gradle.kts file automates the generation of the csv.jar file
- it is a Fat-JAR containing all the dependencies of the JVM-based library
- such JAR is placed in the jcsv/jvm directory
- it is part of Python sources, so that it can be distributed with the Python library
The jcsv/jvm/__init__.py file loads JPype and the csv.jar file

Including `.jar`s in JPype projects (pt. 2)

Snippet from the build.gradle.kts:

tasks.create<Copy>("createCoreJar") {
    group = "Python"
    val shadowJar by project(":csv-core").tasks.getting(Jar::class)
    dependsOn(shadowJar)
    from(shadowJar.archiveFile) {
        rename(".*?\\.jar", "csv.jar")
    }
    into(projectDir.resolve("jcsv/jvm"))
}

Content of the jcsv/jvm/__init__.py file:

import jpype
from pathlib import Path

# the directory where csv.jar is placed
CLASSPATH = Path(__file__).parent

# the list of all .jar files in CLASSPATH
JARS = [str(j.resolve()) for j in CLASSPATH.glob('*.jar')]

jpype.startJVM(classpath=JARS)

Important line in jcsv/__init__.py:
```
import jcsv.jvm
```
this is forcing the startup of the JVM with the correct classpath whenever someone is using the jcsv module

Including JVM in JPype projects

We need to ensure that some JVM is available on the system where jcsv is installed
Notice that the JVM is available as a Python dependency too:
- https://pypi.org/project/jdk4py/
This means that the JVM can be automatically downloaded and installed via pip:
```
pip install jdk4py
```
… or added as a dependency to the requirements.txt file:
```
JPype1==1.4.1
jdk4py==17.0.7.0
```

so, one may simply need to configure JPype to use that JVM:

# jcsv/jvm/__init__.py
import jpype, sys
from jdk4py import JAVA_HOME

def jvm_lib_file_names():
    if sys.platform == "win32":
        return {"jvm.dll"}
    elif sys.platform == "darwin":
        return {"libjli.dylib"}
    else:
        return {"libjvm.so"}


def jvmlib(): 
    for name in __jvm_lib_file_names():
        for path in JAVA_HOME.glob(f"**/{name}"):
            if path.exists:
                return str(path)
    return None

jpype.startJVM(jvmpath=jvmlib())

About unit testing

Unit tests are essential to ensure the correctness of the Pythonic API
- they prevent corruption of the Pythonic API when the JAR is updated
Consider for instance tests in:
- test/test_parsing.py
- test/test_python_api.py
It is important to test all the costumisations and factory methods
- because these are not covered by the unit tests of the JVM-based library

Multi-platform Programming for Research-Oriented Software

Giovanni Ciatto — `giovanni.ciatto@unibo.it`

Compiled on: 2024-02-20

back