新聞電子化とJPEG 2000 その1

本シリーズのお題

kdu_compress -i YOURINPUT.pgm -o YOUROUTPUT.jp2 -rate 1,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.075,0.0625,0.05,0.0 4419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625 Clevels=6 Stiles={1024,1024} Corder=RLCP -jp2_box YOURMETADATA.xml


新聞電子化に限ったことではないのですが、世界的に見て、大規模な電子化(Mass Digitization)プロジェクトでは、JPEG2000の採用が進んでいます。*1


今日から数回に分けて、NDNP(全米電子新聞プロジェクト*2)におけるJPEG2000の詳細を見てみます。もう注意書きするのもなんですが、マニア度高く飛行しますので、しっかりつかまっていてください。


今日は、これからの流れだけを紹介します。重箱の隅をつっつくのは、次回以降になります。


とりあえず、こちらをご覧ください。

  1. The JPEG 2000 file will conform with JP2 file format as specified in ISO/IEC 15444-1:2000 (i.e., JPEG 2000, Part 1).
  2. The JPEG 2000 file will be prepared after any image processing or clean-up is performed. The JPEG 2000 file will correspond with the image that is used for OCR.
  3. The JPEG 2000 file's brand will be “jp2 ”, version will be “0” and compatibility will be “jp2 ”. (Note the space after jp2.)
  4. The JPEG 2000 file's image X origin, image Y origin, tile X origin, and tile Y origin will be 0.
  5. The JPEG 2000 file will contain only one component.
  6. The bit depth of that component will be 8.
  7. The JPEG 2000 file's height and width will be the same as the TIFF master file.
  8. The JPEG 2000 file's tile header will not contain coding style default, coding style component, quantization default, and quantization component marker segments.
  9. The JPEG 2000's progression order will be RLCP (resolution, layer, component, position).
  10. The JPEG 2000 will have 6 decomposition levels.
  11. The JPEG 2000 will have 25 quality layers. The bits per pixel for each quality layer will be: 1,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.07,0.0625,0.05,0.04419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625.
  12. The JPEG 2000's code-block size will be 64x64.
  13. The JPEG 2000 will use the 9-7 irreversible filter.
  14. The JPEG 2000 will be compressed so that it is about one-eighth of the TIFF or 1 bit per pixel.
  15. The JPEG 2000 will use 1024x1024 tiles.
  16. The JPEG 2000's color specification must be either the monochrome (greyscale) enumerated color space or the Monochrome Input restricted ICC profile.
  17. The JPEG 2000 file will not contain regions of interest or precincts.
  18. The JPEG 2000 file will not contain intellectual property rights information.
  19. It is recommended that information about the codec used to encode the JPEG 2000 file (e.g., name, version) be included. The preferred method to do this is an an XML Box containing the relevant MIX elements.
  20. The JPEG 2000 file will contain an XML Box that conforms with the following:
    For newspaper pages:
    .... (続く)


http://www.loc.gov/ndnp/pdf/JPEG2kSpecs09.pdf


これは、NDNPにおけるJPEG 2000の仕様書です。全部で20項目あります。これをひとつづつ見て行きましょう。(明らかにすっ飛ばして良い項目がありますので、それらを除いて、5〜6回くらいに分けていく予定です。)


最後に、実際のJPEG 2000の変換例として、kakaduというものを使った変換を見てみましょう。冒頭にも掲げた「本シリーズのお題」

kdu_compress -i YOURINPUT.pgm -o YOUROUTPUT.jp2 -rate 1,0.84,0.7,0.6,0.5,0.4,0.35,0.3,0.25,0.21,0.18,0.15,0.125,0.1,0.088,0.075,0.0625,0.05,0.0 4419,0.03716,0.03125,0.025,0.0221,0.01858,0.015625 Clevels=6 Stiles={1024,1024} Corder=RLCP -jp2_box YOURMETADATA.xml

は、NDNP推奨の変換コマンドです。おそらく、仕様書の20項目をざっと見た後なら、このコマンドが何を「指示」しているものなのか、分かると思います*3


という流れで行きますので、よろしくお願いします。

*1:「進んでいる」というのは、かなりいい加減な表現です。あくまで、私の感覚の問題ですので、人によっては、異論があると思います。

*2:http://d.hatena.ne.jp/denshikA/20090827参照

*3:それと同時に、デフォルト設定を考慮して、何を「暗示」しているのか、ということも分かると思います。