dcm2xml - Convert DICOM file and data set to XML
dcm2xml [options] dcmfile-in [xmlfile-out]
utility converts the contents of a DICOM file (file format or
raw data set) to XML (Extensible Markup Language). There are two output
formats. The first one is specific to DCMTK with its DTD (Document Type
Definition) described in the file dcm2xml.dtd
. The second one refers to
the 'Native DICOM Model' which is specified for the DICOM Application Hosting
service found in DICOM part 19.
reads a raw data set (DICOM data without a file format
meta-header) it will attempt to guess the transfer syntax by examining the
first few bytes of the file. It is not always possible to correctly guess the
transfer syntax and it is better to convert a data set to a file format
whenever possible (using the dcmconv
utility). It is also possible to
use the -f
options to force dcm2xml
to read a
data set with a particular transfer syntax.
dcmfile-in DICOM input filename to be converted
xmlfile-out XML output filename (default: stdout)
print this help text and exit
print version information and exit
print expanded command line arguments
quiet mode, print no warnings and errors
verbose mode, print processing details
debug mode, print debug information
-ll --log-level [l]evel: string constant
(fatal, error, warn, info, debug, trace)
use level l for the logger
-lc --log-config [f]ilename: string
use config file f for the logger
input file format:
read file format or data set (default)
read file format only
read data set without file meta information
input transfer syntax:
use TS recognition (default)
ignore TS specified in the file meta header
read with explicit VR little endian TS
read with explicit VR big endian TS
read with implicit VR little endian TS
long tag values:
load very long tag values (e.g. pixel data)
do not load very long values (default)
+R --max-read-length [k]bytes: integer (4..4194302, default: 4)
set threshold for long values to k kbytes
specific character set:
require declaration of extended charset (default)
+Ca --charset-assume [c]harset: string
assume charset c if no extended charset declared
check all data elements with string values
(default: only PN, LO, LT, SH, ST, UC and UT)
# this option is only used for the mapping to an appropriate
# XML character encoding, but not for the conversion to UTF-8
convert all element values that are affected
by Specific Character Set (0008,0005) to UTF-8
# requires support from an underlying character encoding library
# (see output of --version on which one is available)
general XML format:
output in DCMTK-specific format (default)
output in Native DICOM Model format (part 19)
add XML namespace declaration to root element
DCMTK-specific format (not with --native-format):
add reference to document type definition (DTD)
embed document type definition into XML document
+Xf --use-dtd-file [f]ilename: string
use specified DTD file (only with +Xe)
write name of the DICOM data elements (default)
do not write name of the DICOM data elements
write binary data of OB and OW elements
(default: off, be careful with --load-all)
encoding of binary data:
encode binary data as hex numbers
(default for DCMTK-specific format)
encode binary data as a UUID reference
(default for Native DICOM Model)
encode binary data as Base64 (RFC 2045, MIME)
The basic structure of the DCMTK-specific XML output created from a DICOM file
looks like the following:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
<meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
<element tag="0002,0000" vr="UL" vm="1" len="4"
<element tag="0002,0013" vr="SH" vm="1" len="16"
<data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
<element tag="0008,0005" vr="CS" vm="1" len="10"
<sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
<element tag="0028,3002" vr="xs" vm="3" len="6"
<element tag="7fe0,0010" vr="OW" vm="1" len="262144"
name="PixelData" loaded="no" binary="hidden">
The 'file-format' and 'meta-header' tags are absent for DICOM data sets.
Attributes with very large value fields (e.g. pixel data) are not loaded by
default. They can be identified by the additional attribute 'loaded' with a
value of 'no' (see example above). The command line option --load-all
forces to load all value fields including the very long ones.
Furthermore, binary information of OB and OW attributes are not written to the
XML output file by default. These elements can be identified by the additional
attribute 'binary' with a value of 'hidden' (default is 'no'). The command
line option --write-binary-data
causes also binary value fields to be
printed (attribute value is 'yes' or 'base64'). But, be careful when using
this option together with --load-all
because of the large amounts of
pixel data that might be printed to the output. Please note that in this
context element values with a VR of OD or OF are not regarded as 'binary
Multiple values (i.e. where the DICOM value multiplicity is greater than 1) are
separated by a backslash '\' (except for Base64 encoded data). The 'len'
attribute indicates the number of bytes for the particular value field as
stored in the DICOM data set, i.e. it might deviate from the XML encoded value
length e.g. because of non-significant padding that has been removed. If this
attribute is missing in 'sequence' or 'item' start tags, the corresponding
DICOM element has been stored with undefined length.
The description of the Native DICOM Model format can be found in the DICOM
standard, part 19 ('Application Hosting').
Binary data, i.e. DICOM element values with Value Representations (VR) of OB or
OW, as well as OD, OF and UN values are by default not written to the XML
output because of their size. Instead, for each element, a new Universally
Unique Identifier (UUID) is being generated and written as an attribute of a
<BulkData> XML element. So far, there is no possibility to write an
additional file to hold the binary data for each of the binary data chunks.
This is not required by the standard, however, it might be useful for
implementing an Application Hosting interface; thus this feature may be
available in future versions of dcm2xml
In addition, Supplement 163 (Store Over the Web by Representational State
Transfer Services) introduces a new <InlineBinary> XML element that
allows for encoding binary data as Base64. Currently, the command line option
enables this encoding for the following VRs: OB, OD,
OF, OW, and UN.
In addition to what is written in the above section on 'Bulk Data', there are
further known issues with the current implementation of the Native DICOM Model
format. For example, large element values with a VR other than OB, OD, OF, OW
or UN are currently never written as bulk data, although it might be useful,
e.g. for very long text elements (especially UT) or very long numeric fields
(of various VRs).
The XML encoding is determined automatically from the DICOM attribute
(0008,0005) 'Specific Character Set' using the following mapping:
ASCII (ISO_IR 6) => "UTF-8"
UTF-8 "ISO_IR 192" => "UTF-8"
ISO Latin 1 "ISO_IR 100" => "ISO-8859-1"
ISO Latin 2 "ISO_IR 101" => "ISO-8859-2"
ISO Latin 3 "ISO_IR 109" => "ISO-8859-3"
ISO Latin 4 "ISO_IR 110" => "ISO-8859-4"
ISO Latin 5 "ISO_IR 148" => "ISO-8859-9"
Cyrillic "ISO_IR 144" => "ISO-8859-5"
Arabic "ISO_IR 127" => "ISO-8859-6"
Greek "ISO_IR 126" => "ISO-8859-7"
Hebrew "ISO_IR 138" => "ISO-8859-8"
If this DICOM attribute is missing in the input file, although needed, option
can be used to specify an appropriate character set
manually (using one of the DICOM defined terms). For reasons of backward
compatibility with previous versions of this tool, the following terms are
also supported and mapped automatically to the associated DICOM defined terms:
latin-1, latin-2, latin-3, latin-4, latin-5, cyrillic, arabic, greek, hebrew.
Multiple character sets using code extension techniques are not supported. If
needed, option --convert-to-utf8
can be used to convert the DICOM file
or data set to UTF-8 encoding prior to the conversion to XML format. This is
also useful for DICOMDIR files where each directory record can have a
different character set.
The level of logging output of the various command line tools and underlying
libraries can be specified by the user. By default, only errors and warnings
are written to the standard error stream. Using option --verbose
informational messages like processing details are reported. Option
can be used to get more details on the internal activity, e.g.
for debugging purposes. Other logging levels can be selected using option
. In --quiet
mode only fatal errors are reported. In
such very severe error events, the application will usually terminate. For
more details on the different logging levels, see documentation of module
In case the logging output should be written to file (optionally with logfile
rotation), to syslog (Unix) or the event log (Windows) option
can be used. This configuration file also allows for
directing only certain messages to a particular output stream and for
filtering certain messages based on the module or application where they are
generated. An example configuration file is provided in
All command line tools use the following notation for parameters: square
brackets enclose optional values (0-1), three trailing dots indicate that
multiple values are allowed (1-n), a combination of both means 0 to n values.
Command line options are distinguished from parameters by a leading '+' or '-'
sign, respectively. Usually, order and position of command line options are
arbitrary (i.e. they can appear anywhere). However, if options are mutually
exclusive the rightmost appearance is used. This behavior conforms to the
standard evaluation rules of common Unix shells.
In addition, one or more command files can be specified using an '@' sign as a
prefix to the filename (e.g. @command.txt
). Such a command argument is
replaced by the content of the corresponding text file (multiple whitespaces
are treated as a single separator unless they appear between two quotation
marks) prior to any further evaluation. Please note that a command file cannot
contain another command file. This simple but effective approach allows one to
summarize common combinations of options/parameters and avoids longish and
confusing command lines (an example is provided in file
utility will attempt to load DICOM data dictionaries
specified in the DCMDICTPATH
environment variable. By default, i.e. if
environment variable is not set, the file
will be loaded unless the dictionary is built
into the application (default for Windows).
The default behavior should be preferred and the DCMDICTPATH
variable only used when alternative data dictionaries are required. The
environment variable has the same format as the Unix shell
variable in that a colon (':') separates entries. On Windows
systems, a semicolon (';') is used as a separator. The data dictionary code
will attempt to load each file specified in the DCMDICTPATH
variable. It is an error if no data dictionary can be loaded.
- Document Type Definition (DTD) file
Copyright (C) 2002-2016 by OFFIS e.V., Escherweg 2, 26121 Oldenburg,