ISO-15444-6-AMD-1-2007.pdf

上传人:爱问知识人 文档编号:3776368 上传时间:2019-09-23 格式:PDF 页数:52 大小:601.42KB
返回 下载 相关 举报
ISO-15444-6-AMD-1-2007.pdf_第1页
第1页 / 共52页
ISO-15444-6-AMD-1-2007.pdf_第2页
第2页 / 共52页
ISO-15444-6-AMD-1-2007.pdf_第3页
第3页 / 共52页
ISO-15444-6-AMD-1-2007.pdf_第4页
第4页 / 共52页
ISO-15444-6-AMD-1-2007.pdf_第5页
第5页 / 共52页
亲,该文档总共52页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《ISO-15444-6-AMD-1-2007.pdf》由会员分享,可在线阅读,更多相关《ISO-15444-6-AMD-1-2007.pdf(52页珍藏版)》请在三一文库上搜索。

1、 Reference number ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 INTERNATIONAL STANDARD ISO/IEC 15444-6 First edition 2003-10-15 AMENDMENT 1 2007-08-15 Information technology JPEG 2000 image coding system Part 6: Compound image file format AMENDMENT 1: Hidden text metadata Technologies de linformat

2、ion Systme de codage dimage JPEG 2000 Partie 6: Format de fichier dimage de composant AMENDEMENT 1: Mtadonnes de texte cach Copyright International Organization for Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007

3、00:59:43 MDTNo reproduction or networking permitted without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobes licensing policy, this file may be printed or viewed but shall not be edited unless the typefac

4、es which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobes licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Syst

5、ems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely

6、event that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT ISO/IEC 2007 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic

7、 or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISOs member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail copyrightiso.org We

8、b www.iso.org Published in Switzerland ii ISO/IEC 2007 All rights reserved Copyright International Organization for Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007 00:59:43 MDTNo reproduction or networking permitt

9、ed without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved iii Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodi

10、es that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other i

11、nternational organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. International Standards are drafted in accordance with the rules gi

12、ven in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least

13、 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Amendment 1 to ISO/IEC 15444-6:2003 was prepared b

14、y Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information. Copyright International Organization for Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie

15、 Not for Resale, 09/03/2007 00:59:43 MDTNo reproduction or networking permitted without license from IHS -,-,- Copyright International Organization for Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007 00:59:43 MDTN

16、o reproduction or networking permitted without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 1 Information technology JPEG 2000 image coding system Part 6: Compound image file format AMENDMENT 1: Hidden text metadata Add the following normative references

17、 to 2.2: IETF RFC 1950, ZLIB Compressed Data Format Specification version 3.3, May 1996 IETF RFC 1951, DEFLATE Compressed Data Format Specification version 1.3, May 1996 IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies IETF RFC 2396, Uniform Res

18、ource Identifiers (URI): Generic Syntax, August 1998 W3C, Cascading Style Sheets, level 1 (CSS1) Specification, http:/www.w3.org/pub/WWW/TR/REC-CSS1 W3C, Cascading Style Sheets, level 2 (CSS2) Specification, http:/www.w3.org/TR/REC-CSS2 W3C, HTML 4.01 Specification, http:/www.w3.org/TR/html401 W3C,

19、XHTML 1.0 Extensible HyperText Markup Language, Second Edition, http:/www.w3.org/TR/xhtml1 W3C, XML Schema Part 0: Primer, Second Edition, http:/www.w3.org/TR/xmlschema-0 W3C, XML Schema Part 1: Structures, Second Edition, http:/www.w3.org/TR/xmlschema-1 W3C, XML Schema Part 2: Datatypes, Second Edi

20、tion, http:/www.w3.org/TR/xmlschema-2 Add the following terms and definitions to Clause 3: 3.23 hidden text symbolic representation for the characters and words found in an image 3.24 annotation particular region of a page in a JPM document that has associated a URL reference, a note or a highlight

21、3.25 hidden text XML XML data which describe hidden text and annotations for a single page in a JPM file and which conform to the schema in Annex H 3.26 compressed hidden text XML hidden text XML data compressed using the mechanisms defined in F.2 Copyright International Organization for Standardiza

22、tion Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007 00:59:43 MDTNo reproduction or networking permitted without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) 2 ISO/IEC 2007 All rights reserved 3.27 hidden text UUID b

23、ox UUID box containing compressed hidden text XML 3.28 hidden text XML Schema XML Schema for hidden text XML, as defined in H.1 Add the following abbreviations to Clause 4: HTX Hidden Text XML Add the following subclause after 5.2.8: 5.3 Hidden Text Metadata Hidden text metadata is data representing

24、 the text, text elements and text flow associated with an image. In the context of this standard, hidden text is associated with a particular region of a page in a JPM document. Common uses for hidden text include text searching and highlighting, cut-and-paste, and text-to-speech processing. Hidden

25、text describes the flow of the text on a page as well as the text elements. JPM allows a rich, multiple content-type representation of a document. Each region of a page may be encoded with a compression technique best suited to its characteristics. In regions containing text, high fidelity reproduct

26、ion of the source image is retained by not replacing the text regions with a character-based rendition through OCR, but rather by using advanced coding methods such as JBIG2. Even OCR results with a 99 percent accuracy contain substantial numbers of errors per page which require expensive human labo

27、ur to correct. The searchable nature of a character-based rendition can be obtained instead by associating hidden “dirty OCR“ results with the corresponding text image. This standard defines a format for hidden text metadata. A key issue with hidden text is capturing the ambiguities seen by the OCR

28、engine in a way that allows properly-constructed search engines to find whether and where a given word might be present in a text image. Properly captured, this information provides nearly as much searching precision as an approach using human- corrected “clean OCR“ data, but at much lower cost. Sea

29、rch results are most useful where there are fewer false positives to weed through. Intelligent search engines can take account of such data as confidence and alternate characters or alternate words to appropriately alter the ranking of search hits on less certain characters. In many cases, true ambi

30、guity exists in the image and it would confuse a human observer as well. In these cases, saving confidence values for characters and their alternatives or describing several alternative parsings of a string of characters into words can amount to saving the state of the OCR process to allow the probl

31、em to be revisited in a later stage, perhaps by a different engine or by access to first a general dictionary and then a set of more specialized dictionaries. As a last step, when a person is presented with the search results, they can dismiss a given search hit by comparison to the actual image dat

32、a for a character or word. For this purpose (and to allow later-stage OCR processes to resume analysis on the image), bounding box rectangles can be defined for all the elements of the hidden text such as characters, words, lines, paragraphs and regions. By indicating a container relationship among

33、these items, intelligent navigation and text selection can occur at character, word, line, paragraph boundaries. A reading order through these rectangles can be defined for what was in the image just a random placement of unrelated glyphs. While it is primarily designed for use by machines such as s

34、earch engines, the hidden text can also serve as a crude (if “dirty“) or adequate (if “clean“) alternate representation for an image region to allow it to display on character-based devices (such as mobile phones) or small-area graphics devices (such as PDAs). Copyright International Organization fo

35、r Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007 00:59:43 MDTNo reproduction or networking permitted without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) ISO/IEC 2007 All rights reserved 3 Annotation

36、s are added to the document typically with a WYSIWYG editor to indicate URL references, notes, and to highlight key sections of the document text. Each annotation is associated with a particular region of a page in a JPM document. XML is used for hidden text and annotations because it is a format wi

37、dely used to store structured information, and can be machine processed. Renumber the original 5.3 as 5.4. Add the following rows at the correct alphabetical location in Table A.1 of A.4: Table A.1 Boxes defined or referenced within this International Standard Box name Type Superbox Comments (Inform

38、ative) Hidden Text Metadata htxb (0x68747862) Yes This optional box contains hidden text and annotations. HTX Reference Box phtx (0x70687478) No This optional box can be used to point to Hidden Text Metadata box contents at top file level. Add the following subclauses after B.6.4: B.6.5 Hidden Text

39、Metadata box (superbox) Box type: htxb (0x68747862) Container: Page box or File Mandatory: No Quantity: At most one if the container is the Page box, any number if the container is the file Location: Anywhere in the Page box after the Page Header box if the container is the Page box, or anywhere aft

40、er the File Type box if the container is the file The Hidden Text Metadata box (htxb) serves as a container for hidden text data. It is a superbox that may contain an optional Label box and must contain one of two box types. It may either contain one XML box containing hidden text metadata, or it ma

41、y contain one UUID box containing hidden text metadata as specified in F.2. The type of a Hidden Text Metadata box shall be htxb (0x68747862). The contents of a Hidden Text Metadata box shall be as in Figure B.25: or Figure B.25 Organization of the contents of a Hidden Text Metadata box Copyright In

42、ternational Organization for Standardization Provided by IHS under license with ISO Licensee=IHS Employees/1111111001, User=Wing, Bernie Not for Resale, 09/03/2007 00:59:43 MDTNo reproduction or networking permitted without license from IHS -,-,- ISO/IEC 15444-6:2003/Amd.1:2007(E) 4 ISO/IEC 2007 All

43、 rights reserved B.6.6 HTX Reference box Box type: phtx (0x70687478) Container: Page box Mandatory: No Quantity: At most one Location: Anywhere in the Page box after the Page Header box If the hidden text for a page is contained in a Hidden Text Metadata box within the corresponding Page box, this b

44、ox must not appear. If the hidden text for a page is contained in a series of one or more Hidden Text Metadata boxes at the file level, one HTX reference box has to be included in the corresponding Page box. The type of a HTX Reference box shall be phtx (0x70687478). The contents of a HTX Reference

45、box shall be as in Figure B.26: Figure B.26 Organization of the contents of a HTX Reference box Rtyp: Referenced box type. This field specifies the actual type (as would be found in the TBox field in an actual box header) of the box referenced by this HTX Reference box. However, a reader shall not a

46、ttempt to locate a physically stored box header for the box represented by this HTX Reference box, as it is legal to use a HTX Reference box to create a new box that is not contiguously contained in other locations within this or other files, and thus the box header will not exist. flst: Fragment Li

47、st box. This box specifies the actual locations of the fragments of the referenced HTX element. When those fragments are concatenated, in order, as specified by the Fragment List box definition, the resulting byte-stream shall be the contents of the referenced HTX element, which contains hidden text

48、 data, and shall not include the box header fields. The format of the Fragment List box is specified in B.5.1.1. If Rtyp is uuid and the UUID signals deflate compression as defined in F.2, the number of fragments of the Fragment List box must be one. label: Label box. This optional box may contain a Label box which specifies a label or name for the hidden text of the corresponding page. The structure of a Label box is specified in B.6.3. Table B.31 HTX Reference box contents data structure values Parameter Size

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1