I ended up solving this problem by having my Python plugin launch an external process (via subprocess.Popen) that reads the unstructured connectivity in chunks of 100,000 cells and writes them to its stdout as plain arrays of offsets, node indices, and cell types, which are convenient for constructing a vtkCellArray for each chunk. The external process is written in C, so all that branching based on what type of cell is coming up next isn’t so expensive. The plugin still reads all the other parts of the file (node coordinates, variables, etc.). The extra executable is easy to build (and instruct my users on building) since it has no dependencies.
With this technique, I can read 46 million cells in about 5 seconds. Doing it all in Python takes at least 100 times longer (I lost patience and killed it after that long).