Lets say you had a file that was 1,000,000,000 characters consisting of
8,000,000,000bits. What if instead of transferring that file through the
interwebs you transmitted a mathematical equation to tell a computer on the
other end how to *construct* that file. First you'd feed the file into a
cruncher of some type to reduce the pattern of 8,000,000,000 bits into an
equation somehow. Sure this would take time, I realize that. The equation
would then be transmitted to the other computer where it would use its
mad-math-skillz to *figure out the answer* which would theoretically be the
same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to
do this kind of math?

Just a weird idea I had. If it's a good idea then please consider this
intellectual property. LOL

That's basically what compression is. Except rarely (read: never) does your
Real Data (tm) fit just one equation, hence the various compression
algorithms that look for patterns etc etc.

Now if your file has patterns that make it compressible, you can make it
smaller, but not
all files can be compressed this way, at least not in a way that makes them
smaller.

To understand why, consider the case of a file of one byte, or 8 bits.
There are 256 possible
files of this size, 00000000, 00000001, 00000010, ..., 11111101, 11111110,
1111111.

Since each code we send must generate a unique file (or what's the point, we
need 256 different codes
to represent each possible file), but the shortest general way to write 256
different codes is
still 8 bits long. Now, we can use coding schemes and say that the one-bit
value 1 represents 11111111
because that file happens a lot. Then we could use 01 to represent
something else, but
we can't use 1 at the beginning again because we couldn't tell that from the
file named by 1.

Bottom line, for some codes to be shorter than the file they represent,
others must be longer...

So if files have a lot of repetition, you can get a win, but for "random"
data, not so much

Wildly off-topic for the NANOG mailing-list, as it has -zero- relevance to
'network operations'

Date: Wed, 18 May 2011 13:07:32 -0700
Subject: Had an idea - looking for a math buff to tell me if it's possible
with today's technology.
From: Landon Stewart <lstewart@superb.net>
To: nanog <nanog@nanog.org>

Lets say you had a file that was 1,000,000,000 characters consisting of
8,000,000,000bits. What if instead of transferring that file through the
interwebs you transmitted a mathematical equation to tell a computer on the
other end how to *construct* that file. First you'd feed the file into a
cruncher of some type to reduce the pattern of 8,000,000,000 bits into an
equation somehow. Sure this would take time, I realize that. The equation
would then be transmitted to the other computer where it would use its
mad-math-skillz to *figure out the answer* which would theoretically be the
same pattern of bits. Thus the same file would emerge on the other end.

The real question here is how long would it take for a regular computer to
do this kind of math?

I have, on my computer, an encoder/decoder that does _exactly_ that.

Both the encoder and decoder are _amazingly_ fast -- as fast as a file copy,
in fact.

the average size of the tranmsitted files, across all possible input files
is exactly 100% of the size of the input files. (one *cannot* do better
than that, across all possible inputs -- see the 'counting' problem, in
data-compression theory)

Just a weird idea I had. If it's a good idea then please consider this
intellectual property. LOL

'Weird' is one word for it. You might want to read up on the subject
of 'data compression', to get an idea of how things work.

See also "polynominial curve-fitting", for the real-world limits of your
theory.
for the real-world limits of your
theory.

no no no.. it's simply, since the OP posited a math solution, md5.
ship the size of file + hash, compute file on the other side. All
files can be moved anywhere regardless of the size of the file in a
single packet.

The solution is left as an exercise for the reader.

Exactly: What you run up against is that you can reduce extraneous information, and compress redundant information, but if you actually have dense information, you're not gonna get any better.

So easy to compress a billion bytes of JSON or XML significantly; not so much a billion bytes of already tightly coded movie.

The concept is called fractals where you can compress the image and send the
values and recreate the image. There was a body of work on the subject, I
would say in the mid to late eighties where two Georgia Tech professors
started a company doing it.

From: Landon Stewart [mailto:lstewart@superb.net]
Sent: Wednesday, May 18, 2011 1:08 PM
To: nanog
Subject: Had an idea - looking for a math buff to tell me if it's
possiblewith today's technology.

Lets say you had a file that was 1,000,000,000 characters consisting

of

8,000,000,000bits. What if instead of transferring that file through
the
interwebs you transmitted a mathematical equation to tell a computer

on

the
other end how to *construct* that file.

Congratulations. You have just invented compression.

In a message written on Wed, May 18, 2011 at 04:33:34PM -0400, Christopher Morrow wrote:

no no no.. it's simply, since the OP posited a math solution, md5.
ship the size of file + hash, compute file on the other side. All
files can be moved anywhere regardless of the size of the file in a
single packet.

The solution is left as an exercise for the reader.

Bah, you should include the solution, it's so trivial.

Generate all possible files and then do an index lookup on the MD5.
It's a little CPU heavy, but darn simple to code.

You can even stop when you get a match, which turns out to be a HUGE
optimization.

no no no.. it's simply, since the OP posited a math solution, md5.
ship the size of file + hash, compute file on the other side. All
files can be moved anywhere regardless of the size of the file in a
single packet.

MD5 compression is lossy in this context. Given big enough files
you're going to start seeing hash collisions.

You would need a lot of computing power to generate a file of any
decent size. If you want to be evil then you could send just a md5
hash and a sha512 hash (or some other hash that would not have a
collision at the same time except when correct)