You can configure pmacct to specify on which properties of the received
flow data it should aggregate its output data, one could configure
pmacct to store data using the following primitives:
($timeperiod, $entrypoint_router_id, $bgp_nexthop, $packet_count)
Where $timeperiod is something like 5 minute ranges, and the post
processing software calculates the distance between the entrypoint
router and where the flow would leave the network ($bgp_nexthop).
See 'aggregate' on http://wiki.pmacct.net/OfficialConfigKeys
In short: you configure pmacct to throw away everything you don't need
(maybe after some light pre-processing), and hope that what remains is
small enough to fit in your cluster and at the same time offers enough
insight to answer the question you set out to resolve.
it's late here, so i am a bit slower than usual. but could you explain
in detail how this tests the hypothesis?
even of all your traffic entered on a bgp hop and exited on a bgp hop,
and all bgp entries set next_hop (which i think you do), you would be
ignoring the 'distance' the packet traveled from source to get to your
entry and traveled from your exit to get to the final destination.