archive:create-from and archive:create support for archives larger than 2 GB#2624
Conversation
…tOfBoundsException: Maximum array size exceeded" when creating zip file larger than 2 GB
…tOfBoundsException: Maximum array size exceeded" when creating zip file larger than 2 GB, return lazy reference to temp file
|
@vincentml Thanks for the PR; the solution is complete, well-tested and creative. My subsequent revisions are all secondary (26ce16e). One observation: For the threshold computation to work properly, |
|
@ChristianGruen Thank you for merging this PR and your subsequent improvements! I've continued to have second thoughts about the threshold computation based on free memory, and am still unsure of what computation would work well for situations where multiple processes run in parallel using the |
This pull request makes it possible for
archive:createandarchive:create-fromto create zip files that are larger than 2 Gb.Using BaseX version 12.2, when attempting to create zip files using
archive:createorarchive:create-fromand the size of the files is larger than about 2 Gb I've run into error messages such as "java.lang.ArrayIndexOutOfBoundsException: Maximum array size exceeded (2147483640 > 2147483639)."For example, this error is produced if the total size of a folder being zipped is 3 Gb when passing the result of
archive:create-fromdirectly tofile:write-binary:and when using a variable to pass the result of
archive:create-fromtofile:write-binary:After the changes in this pull request, the above queries produce the expected zip file and the error does not occur.
The current limitation of ~ 2 Gb is due to the file contents being accumulated in memory and exceding the maximum array size set by Java's Integer.MAX_VALUE.
This pull request solves this problem by avoiding the use of an array, and instead accumulates data in memory up to a threshold then switches to a temporary file if the data exceeds the threshold. The threshold is determined from available memory capped at the maximum array size. The temporary file, if created, is deleted automatically. This approach attempts to optimize for the typical use cases of creating small or mid-size archives while making it possible to create very large archives.