Discussion:
Can we change the xz block size in our cloud images?
Richard W.M. Jones
2018-11-23 12:42:06 UTC
Permalink
It looks like the raw format xz-compressed cloud images that we ship
use a very large block size, possibly 192M. This is not ideal and it
would be better to use a smaller block size such as 16M so that they
can be consumed without having to be uncompressed by nbdkit, or even
be used remotely without even downloading them.

(Long story here: https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/ )

I recompressed the Fedora 29 cloud image using a 16M block size and
there is about 4% overhead to doing this:

before 194278292
after 202874868

At the moment I'm not clear what actual component does the xz
compression step, so I don't even know where I could file a bug or who
I could discuss this with, nor what the current code looks like.
Apparently it's no longer done using appliance-tools.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists
Peter Robinson
2018-11-23 12:48:30 UTC
Permalink
Post by Richard W.M. Jones
It looks like the raw format xz-compressed cloud images that we ship
use a very large block size, possibly 192M. This is not ideal and it
would be better to use a smaller block size such as 16M so that they
can be consumed without having to be uncompressed by nbdkit, or even
be used remotely without even downloading them.
(Long story here: https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/ )
I recompressed the Fedora 29 cloud image using a 16M block size and
before 194278292
after 202874868
At the moment I'm not clear what actual component does the xz
compression step, so I don't even know where I could file a bug or who
I could discuss this with, nor what the current code looks like.
Apparently it's no longer done using appliance-tools.
The cloud images haven't used appliance-tools for years. It uses
imagefactory and some bits in koji.

Looking at the output of the logs it just runs:
$ /usr/bin/xz -z9T2
/var/tmp/koji/tasks/2418/31062418/Fedora-Cloud-Base-Rawhide-20181123.n.0.aarch64.raw

Example task: https://koji.fedoraproject.org/koji/taskinfo?taskID=31062418

So at a guess it should be a straight forward patch to koji.

Peter
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/li
Richard W.M. Jones
2018-11-23 13:43:08 UTC
Permalink
Post by Peter Robinson
Post by Richard W.M. Jones
It looks like the raw format xz-compressed cloud images that we ship
use a very large block size, possibly 192M. This is not ideal and it
would be better to use a smaller block size such as 16M so that they
can be consumed without having to be uncompressed by nbdkit, or even
be used remotely without even downloading them.
(Long story here: https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/ )
I recompressed the Fedora 29 cloud image using a 16M block size and
before 194278292
after 202874868
At the moment I'm not clear what actual component does the xz
compression step, so I don't even know where I could file a bug or who
I could discuss this with, nor what the current code looks like.
Apparently it's no longer done using appliance-tools.
The cloud images haven't used appliance-tools for years. It uses
imagefactory and some bits in koji.
$ /usr/bin/xz -z9T2
/var/tmp/koji/tasks/2418/31062418/Fedora-Cloud-Base-Rawhide-20181123.n.0.aarch64.raw
Example task: https://koji.fedoraproject.org/koji/taskinfo?taskID=31062418
So at a guess it should be a straight forward patch to koji.
Thanks Peter, I found the source now:

https://pagure.io/koji/blob/master/f/builder/kojid#_3887

More generally, what are the goals for these cloud images? Example
questions for the mailing list:

Must they be absolutely as small as possible? (I assume small size is
a goal to some extent because of the cost of bandwidth and mirroring
space).

Is it important to let people download them and run them without
uncompressing them? For me, unxz is pretty slow as well as the
obvious problem with consuming disk space, so I can get started on a
cloud image faster if it can be uncompressed on the fly (faster still
if I didn't have to download it up front, but that has other issues
like causing load on the mirror sites).

$ time unxz Fedora-Cloud-Base-29-1.2.x86_64.raw.xz
real 0m23.760s
user 0m21.564s
sys 0m0.729s

What do people do with the cloud images? Do you download them and put
them in local a Glance store? Do you ignore the published cloud
images and instead get Fedora on clouds through some other method like
AMIs published by 3rd parties?

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.o
Dennis Gilmore
2018-11-23 15:16:50 UTC
Permalink
Post by Richard W.M. Jones
On Fri, Nov 23, 2018 at 12:43 PM Richard W.M. Jones <
Post by Richard W.M. Jones
It looks like the raw format xz-compressed cloud images that we ship
use a very large block size, possibly 192M. This is not ideal and it
would be better to use a smaller block size such as 16M so that they
can be consumed without having to be uncompressed by nbdkit, or even
be used remotely without even downloading them.
https://rwmj.wordpress.com/2018/11/23/nbdkit-xz-curl/ )
I recompressed the Fedora 29 cloud image using a 16M block size and
before 194278292
after 202874868
At the moment I'm not clear what actual component does the xz
compression step, so I don't even know where I could file a bug or who
I could discuss this with, nor what the current code looks like.
Apparently it's no longer done using appliance-tools.
The cloud images haven't used appliance-tools for years. It uses
imagefactory and some bits in koji.
$ /usr/bin/xz -z9T2
/var/tmp/koji/tasks/2418/31062418/Fedora-Cloud-Base-Rawhide-
20181123.n.0.aarch64.raw
https://koji.fedoraproject.org/koji/taskinfo?taskID=31062418
So at a guess it should be a straight forward patch to koji.
https://pagure.io/koji/blob/master/f/builder/kojid#_3887
More generally, what are the goals for these cloud images? Example
Must they be absolutely as small as possible? (I assume small size is
a goal to some extent because of the cost of bandwidth and mirroring
space).
Is it important to let people download them and run them without
uncompressing them? For me, unxz is pretty slow as well as the
obvious problem with consuming disk space, so I can get started on a
cloud image faster if it can be uncompressed on the fly (faster still
if I didn't have to download it up front, but that has other issues
like causing load on the mirror sites).
$ time unxz Fedora-Cloud-Base-29-1.2.x86_64.raw.xz
real 0m23.760s
user 0m21.564s
sys 0m0.729s
What do people do with the cloud images? Do you download them and put
them in local a Glance store? Do you ignore the published cloud
images and instead get Fedora on clouds through some other method like
AMIs published by 3rd parties?
Primarily they are there for use in uploading to cloud providers. When
we set things up our goal was solely to make them as small as possible,
our only use case was uploading to EC2 and we delivered them as part of
the release solely to make sure they were available and people could
compare what was in EC2 to what is available for verification
processes. The only other use case we considered was people downloading
and putting them in their own cloud environment. It sounds like you
have some other use cases and should work with release engineering to
look at accomodating them. We probably should have some documentation
on each deliverable and the use cases they are supposed to support.

Dennis

Loading...