COR-354: Upgrade the versions of pdfbox, poi, tikaFix description:* Update the versions in the main pom* Remove InvalidPasswordExcetion in PDDocument.decrypt(). This change exists since pdfbox 1.8.6 (PDFBOX-1474).* TIKA-1400 (tika 1.10) extracts the header and footer of Excel file (.xls). The information is then put into class "outside". The output of TestMSExcelOnTikaDocumentReader must be therefore updated.
COR-331: Implement MSPPTXStreamDocumentReader using SAXParserProblem analysis:* Apache's POI for MS PPTX files provides only in-memory model. In this model, SAXParser is used too many times (triple the slide number) even to get some meta data information. It is therefore unsuitable to parse very big files (in terms of slide number).Fix description:* Implement a new document reader for PPTX files by reading the stream.* Get meta data information directly from the corresponding file (core.xml) if this file exists.* Parse and index text in a certain number of first slides.