BS ISO 28500:2017
$167.15
Information and documentation. WARC file format
Published By | Publication Date | Number of Pages |
BSI | 2017 | 36 |
This document specifies the WARC file format:
-
to store both the payload content and control information from mainstream Internet application layer protocols, such as the HTTP, DNS, and FTP;
-
to store arbitrary metadata linked to other stored data (e.g. subject classifier, discovered language, encoding);
-
to support data compression and maintain data record integrity;
-
to store all control information from the harvesting protocol (e.g. request headers), not just response information;
-
to store the results of data transformations linked to other stored data;
-
to store a duplicate detection event linked to other stored data (to reduce storage in the presence of identical or substantially similar resources);
-
to be extended without disruption to existing functionality;
-
to support handling of overly long records by truncation or segmentation, where desired.
PDF Catalog
PDF Pages | PDF Title |
---|---|
2 | National foreword |
7 | Foreword |
8 | Introduction |
9 | 1 Scope 2 Normative references |
10 | 3 Terms, definitions and abbreviated terms |
11 | 4 File and record model |
13 | 5 Named fields 5.1 General 5.2 WARC-Record-ID (mandatory) 5.3 Content-Length (mandatory) |
14 | 5.4 WARC-Date (mandatory) 5.5 WARC-Type (mandatory) 5.6 Content-Type |
15 | 5.7 WARC-Concurrent-To 5.8 WARC-Block-Digest 5.9 WARC-Payload-Digest |
16 | 5.10 WARC-IP-Address 5.11 WARC-Refers-To 5.12 WARC-Refers-To-Target-URI 5.13 WARC-Refers-To-Date |
17 | 5.14 WARC-Target-URI 5.15 WARC-Truncated 5.16 WARC-Warcinfo-ID 5.17 WARC-Filename |
18 | 5.18 WARC-Profile 5.19 WARC-Identified-Payload-Type 5.20 WARC-Segment-Number 5.21 WARC-Segment-Origin-ID 5.22 WARC-Segment-Total-Length |
19 | 6 WARC record types 6.1 General 6.2 ‘warcinfo’ 6.3 ‘response’ 6.3.1 General |
20 | 6.3.2 ‘http’ and ‘https’ schemes 6.3.3 Other URI schemes 6.4 ‘resource’ 6.4.1 General 6.4.2 ‘http’ and ‘https’ schemes 6.4.3 ‘ftp’ scheme |
21 | 6.4.4 ‘dns’ scheme 6.4.5 Other URI schemes 6.5 ‘request’ 6.5.1 General 6.5.2 ‘http’ and ‘https’ schemes 6.5.3 Other URI schemes 6.6 ‘metadata’ |
22 | 6.7 ‘revisit’ 6.7.1 General 6.7.2 Profile: Identical Payload Digest |
23 | 6.7.3 Profile: Server Not Modified 6.7.4 Other profiles 6.8 ‘conversion’ |
24 | 6.9 ‘continuation’ 7 Record segmentation 8 WARC file name, size and compression |
26 | Annex A (informative) Use cases for writing WARC records |
29 | Annex B (informative) Examples of WARC records |
32 | Annex C (informative) WARC file size and name recommendations |
33 | Annex D (informative) Compression recommendations |
34 | Bibliography |