COR-331: Implement MSPPTXStreamDocumentReader using SAXParser Problem analysis: * Apache's POI for MS PPTX files provides only in-memory model. In this model, SAXParser is used too many times (triple the slide number) even to get some meta data information. It is therefore unsuitable to parse very big files (in terms of slide number).
Fix description: * Implement a new document reader for PPTX files by reading the stream. * Get meta data information directly from the corresponding file (core.xml) if this file exists. * Parse and index text in a certain number of first slides.