Regular Expression for IPv6 addresses

Folks,

My company, Dartware, have derived a regex for testing whether an IPv6 address is correct. I've posted it in my blog:

  http://intermapper.ning.com/profiles/blogs/a-regular-expression-for-ipv6

This has links to the regular expression, a (Perl) program that tests various correct and malformed addresses, and a Ruby implementation of the same.

Hope it's useful.

Rich Brown richard.e.brown@dartware.com
Dartware, LLC http://www.dartware.com
66-7 Benning Street Telephone: 603-643-9600
West Lebanon, NH 03784-3407 Fax: 603-643-2289

Richard E. Brown wrote:

Folks,

My company, Dartware, have derived a regex for testing whether an IPv6
address is correct. I've posted it in my blog:

    http://intermapper.ning.com/profiles/blogs/a-regular-expression-for-ipv6

This has links to the regular expression, a (Perl) program that tests
various correct and malformed addresses, and a Ruby implementation of
the same.

You know, link local addresses (fe80::/10) are quite useless without
specifying the zone of that address. See section 11 of RFC4007.

The only proper way of "testing" if an address is a valid IPv6 address
is to feed it to getaddrinfo() and then use it through that API.
Yes, you can make some assumptions, but it has shown that people
assuming that everything stayed under 2001::/16 also got it wrong at one
point in time. Thus just feed it to getaddrinfo() if you really need it.

Greets,
Jeroen

And now for the trick question. Is ::ffff:077.077.077.077 a legal
mapped address and if it, does it match 077.077.077.077?

Mark

Mark Andrews wrote:
[..]

And now for the trick question. Is ::ffff:077.077.077.077 a legal
mapped address and if it, does it match 077.077.077.077?

::ffff:0:0:0:0/96 should never ever be shown to a user, as it is
confusing (is it IPv6 or IPv4?) and does not make sense at all.
As such whatever one thinks of it, it is "illegal" in that context.

Internally inside a program though using a 128bit sequence of memory is
of course a great way to store both IPv6 and IPv4 addresses in one
structure and that is where the ::ffff:0:0:0:0::/96 format is very
useful and intended for. Of course still the representation to the user
of addresses stored that way would be 77.77.77.77 (and thus an IPv4
address and not IPv6) even though internally it is written as an IPv6
address.

As that usage is internal, you don't need any validation of the format
as the input will be either an IPv6 or IPv4 address without any of the
compatibility stuff, thus one does not need to handle it anyway.

Of course, there should be only limited places where a user can enter or
see IP addresses in the first place. There is this great thing called
DNS which is what most people should be using.

Greets,
Jeroen

Mark Andrews wrote:
[..]
> And now for the trick question. Is ::ffff:077.077.077.077 a legal
> mapped address and if it, does it match 077.077.077.077?

::ffff:0:0:0:0/96 should never ever be shown to a user, as it is
confusing (is it IPv6 or IPv4?) and does not make sense at all.
As such whatever one thinks of it, it is "illegal" in that context.

Internally inside a program though using a 128bit sequence of memory is
of course a great way to store both IPv6 and IPv4 addresses in one
structure and that is where the ::ffff:0:0:0:0::/96 format is very
useful and intended for. Of course still the representation to the user
of addresses stored that way would be 77.77.77.77 (and thus an IPv4
address and not IPv6) even though internally it is written as an IPv6
address.

You missed the point 077 is octal and 077.077.077.077 is 63.63.63.63
in the IPv4 address whereas it is decimal dotted quad in a mapped
address *if* zero padded decimal dotted quad is legal in a IPv6
text form.

> And now for the trick question. Is ::ffff:077.077.077.077 a legal
> mapped address and if it, does it match 077.077.077.077?

::ffff:0:0:0:0/96 should never ever be shown to a user, as it is
confusing (is it IPv6 or IPv4?) and does not make sense at all.
As such whatever one thinks of it, it is "illegal" in that context.

Define "user"? Both Cisco and Juniper use these addresses for IPv6
L3VPNs, and the addresses are definitely visible. Cisco and Juniper
examples:

B 2001:abcd:60:3::/64
      [200/0] via ::ffff:172.16.101.204 (nexthop in vrf default), 4d10h
B 2001:abcd:60:4::/64
      [200/0] via ::ffff:172.16.101.205 (nexthop in vrf default), 4d10h
B 2001:abcd:60:7::/64
      [200/0] via ::ffff:172.16.1.7 (nexthop in vrf default), 6d13h

::ffff:172.16.1.1/128
                   *[LDP/6] 4d 11:01:30, metric 1
                    > to 172.16.102.201 via ge-0/3/0.0, Push 313008
::ffff:172.16.1.2/128
                   *[LDP/6] 1w0d 20:27:12, metric 1
                    > to 172.16.102.201 via ge-0/3/0.0, Push 312240
::ffff:172.16.1.3/128
                   *[LDP/6] 4d 11:01:30, metric 1
                    > to 172.16.102.201 via ge-0/3/0.0, Push 313024

Steinar Haug, Nethelp consulting, sthaug@nethelp.no

I Just Don't Know What To Do With Myself

Wasn't there an internet draft on that subject, recently?
http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04

077.077.077.077 is equivalent to 77.77.77.77 if valid at all
RFC 4038 is very clear that the text representation of a mapped IPv4
address is Base 10. http://tools.ietf.org/html/rfc4038#section-5.1

This is a bit like asking if "::ffff:10.1.2" is a valid IP
address though.
And is it the same as the ip address "10.1.2" ?

(Which of course expands to 10.1.0.2, on common implementations of
inet_pton, inet_aton, and getaddrinfo) Or ::ffff:0xA010002

I would say these are perfectly valid _shorthands_ and
abbreviations for entering an IP address, which may be provided by
some systems, but that they are non-canonical text representations
for displaying publishing or sharing IP addresses.

>> > And now for the trick question. =A0Is ::ffff:077.077.077.077 a legal
>> > mapped address and if it, does it match 077.077.077.077?

Wasn't there an internet draft on that subject, recently?
draft-ietf-6man-text-addr-representation-04

077.077.077.077 is equivalent to 77.77.77.77 if valid at all
RFC 4038 is very clear that the text representation of a mapped IPv4
address is Base 10. RFC 4038 - Application Aspects of IPv6 Transition

But 077.077.077.077 is octal dotted quad. Decimal dotted quad does
*not* have leading zeros. The point of allowing for dotted quad
is to allow for easy mapping between IPv4 representation and IPv6
with encoded IPv4 representations. Accepting a octal representation
as decimal is a bad thing and leads to none obvious failures.

% ping 077.077.077.077
PING 077.077.077.077 (63.63.63.63): 56 data bytes
^C
--- 077.077.077.077 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss
%

"ping ::ffff:077.077.077.077" would not get to same box if my ping
accepted that as a address literal which luckily it doesn't.

This is a bit like asking if "::ffff:10.1.2" is a valid IP
address though.

Except it clearly isn't as there are not 4 components.

And is it the same as the ip address "10.1.2" ?

(Which of course expands to 10.1.0.2, on common implementations of
inet_pton, inet_aton, and getaddrinfo) Or ::ffff:0xA010002

inet_pton() did not accept 10.1.2 when it was originally written.
This was a *deliberate* decision. Some vendors have changed it to
accept it but they are wrong. I can say that because I was involved
in making that decision.

Forget IPv6. The first question is does 077.077.077.077 match 077.077.077.077 in IPv4?

The answer is a long one full of different answers depending on who's doing the parsing (gethostbyname(), inet_aton(), inet_net_pton(), etc..) and on what OS. And also on many bugs.

And don't count on the documentation being right either, or parsers respecting standards (single unix or RFCs, or which one when they conflict). And don't expect an error code if you feed 080.080.080.080 into a parser, even one that *does* read it as octal.

Don't prefix IP (v4) address octets with zero wether you expect it to be treated as octal or not. Just don't. World of hurt and all that.

E.g.:
http://kerneltrap.org/mailarchive/openbsd-bugs/2009/6/6/5882713/thread

We should all do like one vendor I've seen where you enter the IP (v4) address in binary... and then pad with zeroes to whatever size html form wanted. Yes, this decade.

> And now for the trick question. Is ::ffff:077.077.077.077 a legal
> mapped address and if it, does it match 077.077.077.077?

Forget IPv6. The first question is does 077.077.077.077 match
077.077.077.077 in IPv4?

I think you meant "does 077.077.077.077 match 77.77.77.77 in IPv4".

The answer is a long one full of different answers depending on
who's doing the parsing (gethostbyname(), inet_aton(),
inet_net_pton(), etc..) and on what OS. And also on many bugs.

Indeed. It's a minefield out there for application developers that
want consistancy. Even when you develop your own some OS vendor will
go and stuff it up on you.

> > And now for the trick question. Is ::ffff:077.077.077.077 a legal
> > mapped address and if it, does it match 077.077.077.077?
>
> Forget IPv6. The first question is does 077.077.077.077 match
> 077.077.077.077 in IPv4?

I think you meant "does 077.077.077.077 match 77.77.77.77 in IPv4".

No, he had it right, because...

> The answer is a long one full of different answers depending on
> who's doing the parsing (gethostbyname(), inet_aton(),
> inet_net_pton(), etc..) and on what OS. And also on many bugs.

Indeed. It's a minefield out there for application developers that
want consistancy. Even when you develop your own some OS vendor will
go and stuff it up on you.

There's no guarantee that 2 different binaries on the same box will resolve
077.077.077.077 to the same 32-bit sequence, so it's in fact possible that
it's not even equal to itself, much less 77.77.77.77.

There's a full grammar in RFC 3986 (URI Generic Syntax) already, which
can be translated straight. It too handles the embedded IPv4 addresses.

While your code is written in a more condensed manner, those who want to
be able to cross-check against the RFC might want to take a look at this
one, which emits a PCRE regexp:
  http://people.spodhuis.org/phil.pennock/software/emit_ipv6_regexp-0.304
  http://people.spodhuis.org/phil.pennock/software/emit_ipv6_regexp-0.304.asc

(Version numbers for repository, not for that one script :slight_smile: ).

FWIW, the ability to grab a shell variable which contains an RE for IPv6
addresses, which can be used in:
  pcregrep "$ipv6_regex" log_file
has proven very useful, especially when debugging newly-added IPv6
support for an app. This is also the most coherent justification I've
come up with so far for using a regexp instead of a dedicated parser,
other than "because I could".

Regards,
-Phil