Chemistry::Formula - Enumerate elements in a chemical formula

use Chemistry::Formula qw(parse_formula);
parse_formula('Pb (H (TiO3)2 )2 U [(H2O)3]2', \%count);

That is obviously not a real compound, but it demonstrates the capabilities of
the routine. This returns

%count = (
'O' => 18,
'H' => 14,
'Ti' => 4,
'U' => 1,
'Pb' => 1
);

This module provides a function which parses a string containing a chemical
formula and returns the number of each element in the string. It can handle
nested parentheses and square brackets and correctly computes stoichiometry
given numbers outside the (possibly nested) parentheses.

No effort is made to evaluate the chemical plausibility of the formula. The
example above parses just fine using this module, even though it is clearly
not a viable compound. Charge balancing, bond valence, and so on is beyond the
scope of this module.

Only one function is exported, "parse_formula". This takes a string
and a hash reference as its arguments and returns 0 or 1.

$ok = parse_formula('PbTiO3', \%count);

If the formula was parsed without trouble, "parse_formula" returns 1.
If there was any problem, it returns 0 and $count{error} is filled with a
string describing the problem. It throws an error afer the

**first** error
encountered without testing the rest of the string.

If the formula was parsed correctly, the %count hash contains element symbols as
its keys and the number of each element as its values.

Here is an example of a program that reads a string from the command line and,
for the formula unit described in the string, writes the weight and absorption
in barns.

use Data::Dumper;
use Xray::Absorption;
use Chemistry::Formula qw(parse_formula);
parse_formula($ARGV[0], \%count);
print Data::Dumper->Dump([\%count], [qw(*count)]);
my ($weight, $barns) = (0,0);
foreach my $k (keys(%$count)) {
$weight +=
Xray::Absorption -> get_atomic_weight($k) * $count{$k};
$barns +=
Xray::Absorption -> cross_section($k, 9000) * $count{$k};
};
printf "This weighs %.3f amu and absorbs %.3f barns at 9 keV.\n",
$weight, $barns;

Pretty simple.

The parser is not brilliant. Here are the ground rules:

- 1.
- Element symbols must be first letter capitalized.

- 2.
- Whitespace is unimportant -- it will be removed from the
string. So will dollar signs, underscores, and curly braces (in an attempt
to handle TeX). Also a sequence like this: '/sub 3/' will be converted to
'3' (in an attempt to handle INSPEC).

- 3.
- Numbers can be integers or floating point numbers. Things
like 5, 0.5, 12.87, and .5 are all acceptible, as is exponential notation
like 1e-2. Note that exponential notation must use a leading number to
avoid confusion with element symbols. That is, 1e-2 is ok, but e-2 is
not.

- 4.
- Uncapitalized symbols or unrecognized symbols will flag an
error.

- 5.
- An error will be flagged if the number of open parens is
different from the number of close parens.

- 6.
- An error will be flagged if any unusual symbols are found
in the string.

This was written at the suggestion of Matt Newville, who tested early versions.

The routine "matchingbrace" was swiped from the C::Scan module, which
can be found on CPAN. C::Scan is maintained by Hugo van der Sanden.

Bruce Ravel <bravel AT bnl DOT gov>

http://cars9.uchicago.edu/~ravel/software/

SVN repository: http://cars9.uchicago.edu/svn/libperlxray/