MODE-1639, MODE-1640, MODE-1634 Replaced the Aperture-based MIME type detector with a Tika-based one This required quite a bit of dependency gymnastics, since Tika has quite a few more transitive dependencies than the Aperture library (which we had successfully pared down several years ago). Tika references about 25 dependencies (including transitive dependencies), but this was reduced in 'modeshape-jcr' to about 8 for basic MIME type detection. Note that Tika usually includes two BouncyCastle libraries in its dependencies (used for encrypted PDFs, among other things), but ModeShape intentionally excludes these (as we don't want to ship or depend on any security-related JARs).
Not only do we get Tika's substantial MIME type database, we've made it possible for users to edit the 'org/modeshape/custom-mimetypes.xml' file and provide the updated one on the application classpath. What goes in that file will overwrite all of the other sources (namely Tika's built-in file and its customization file, both of which are to be found on the classpath), which means it's easiest to simply provide an updated version of this file at 'org/modeshape/custom-mimetypes.xml'. Be sure to not remove any of the (few) customizations that ModeShape includes - those are important.
As we upgrade Tika, we'll get updated versions of the media type data. This is far more preferable than having a ModeShape-specific version.
The MIME type related interfaces in ModeShape's public API (e.g., 'modeshape-jcr-api') have been removed. These were added sometime in one of the 3.0 releases, so removing them will not introduce compatibility issues for users.
Instead, we've decided to get out of the MIME type detection framework business, and have decided to switch to Tika for all MIME type detection. In fact, you can still write your own MIME type detector, but you do that by implementing Tika's interface and reference the implementation class(es) in the corresponding service loader file in your JAR. (See the TIKA documentation for details.)
However, internally we still have an abstraction. This is because it is possible to remove the Tika (and transitive dependencies) from a ModeShape installation, as long as your applications will not expect any kind of automatic MIME type detection. This is a perfectly valid use case: for example, using a repository to store data and do not store files (and don't use sequencers).
The AS7 kits required a bit more modification. There is now a new AS7 module for 'org.apache.tika' that contains all of the JARs, and this is used by the ModeShape module and by the Tika text extractor module.
All unit and integration tests pass with these changes. Several new tests were added.