Discussion:
Proposal: Faster composes by eliminating deltarpms and using zchunked rpms instead
Jonathan Dieter
2018-11-16 22:07:09 UTC
Permalink
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].

<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>

*deltarpm background*
As part of the compose process, deltarpms are generated between each
new rpm and both the GA version of the rpm and the previous version.
This process is very CPU and memory intensive, especially for large
rpms.

This also means that deltarpms are only useful for an end user if they
are either updating from GA or have been diligent about keeping their
system up-to-date. If a user is updating a package from N-2 to N,
there will be no deltarpm and the full rpm will be downloaded.

*zchunk background*
As some are aware, I've been working on zchunk[2], a compression format
that's designed for highly efficient deltas, and using it minimize
metadata downloads[3].

The core idea behind zchunk is that a file is split into independently
compressed chunks and the checksum of each compressed chunk is stored
in the zchunk header. When downloading a new version of the file, you
download the zchunk header first, check which chunks you already have,
and then download the rest.

*Proposal*
My proposal would be to make zchunk the rpm compression format for
Fedora. This would involve a few additions to the zchunk format[4]
(something the format has been designed to accommodate), and would
require some changes to the rpm file format.

*Benefit*
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.

The uncompressed local chunks would be combined with the downloaded
compressed chunks to create a local rpm that will pass signature
verification without needing to recompress the uncompressed local
chunks, making this computationally much faster than rebuilding a
deltarpm, a win for users.

The savings wouldn't be as good as what deltarpm can achieve, but
deltarpms would be redundant and could be removed, completely
eliminating a large step from the compose process.

*Drawbacks*
1. Downloading a new release of a zchunked rpm would be larger than
downloading the equivalent deltarpm. This is offset by the fact
that the client is able to work out which chunks it needs no matter
what the original rpm is, rather than needing a specific original
rpm as deltarpm does.
2. The rebuilt rpm may not be byte-for-byte identical to the original,
but will be able to be validated without decompression, as explained
in the next section

*Changes*
The zchunk format would need to be extended to allow for a zchunked rpm
to contain both the uncompressed chunks that were already on the local
system and the newly downloaded compressed chunks while still passing
signature verification. This would also require moving signature
verification to zchunk.

The rpm file format has to be changed because the zchunk header needs
to be at the beginning of the file in order for the zchunk library
figure out which chunks it needs to download. My suggestions for
changes to the rpm file format are as follows:

1. Signing should be moved to the zchunk format as described at the
beginning of this section
2. The rpm header should be stored in one stream inside the zchunk
file. This allows it to be easily extracted separately from the
data
3. The rpm cpio should be stored in a second stream inside the zchunk
file.
4. At minimum, an optional zchunk element should be set to identify
zchunk rpms as rpms rather than regular zchunk files. If desired,
optional elements could also be set containing %{name}, %[version},
%{release}, %{arch} and %{epoch}. This would allow this information
to be read easily without needing to extract the rpm header stream.

*Final notes*
I realize this is a massive proposal, zchunk is still very young, and
we're still working on getting the dnf zchunk pull requests reviewed.
I do think it's feasible and provides an opportunity to eliminate a
pain point from our compose process while still reducing the download
size for our users.

[1]:
https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes
[2]: https://github.com/zchunk/zchunk
[3]: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata
[4]: https://github.com/zchunk/zchunk/blob/master/zchunk_format.txt
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.o
Adam Williamson
2018-11-16 23:00:47 UTC
Permalink
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time.
<snip>

you had me at "reducing compose time"
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedo
Jim Perrin
2018-11-16 23:06:34 UTC
Permalink
Post by Adam Williamson
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time.
<snip>
you had me at "reducing compose time"
+1. Reducing compose time is at the top of several of my lists.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproj
Neal Gompa
2018-11-17 14:53:00 UTC
Permalink
Post by Jonathan Dieter
*Changes*
The zchunk format would need to be extended to allow for a zchunked rpm
to contain both the uncompressed chunks that were already on the local
system and the newly downloaded compressed chunks while still passing
signature verification. This would also require moving signature
verification to zchunk.
The rpm file format has to be changed because the zchunk header needs
to be at the beginning of the file in order for the zchunk library
figure out which chunks it needs to download. My suggestions for
1. Signing should be moved to the zchunk format as described at the
beginning of this section
2. The rpm header should be stored in one stream inside the zchunk
file. This allows it to be easily extracted separately from the
data
3. The rpm cpio should be stored in a second stream inside the zchunk
file.
4. At minimum, an optional zchunk element should be set to identify
zchunk rpms as rpms rather than regular zchunk files. If desired,
optional elements could also be set containing %{name}, %[version},
%{release}, %{arch} and %{epoch}. This would allow this information
to be read easily without needing to extract the rpm header stream.
*Final notes*
I realize this is a massive proposal, zchunk is still very young, and
we're still working on getting the dnf zchunk pull requests reviewed.
I do think it's feasible and provides an opportunity to eliminate a
pain point from our compose process while still reducing the download
size for our users.
If we're really considering changing the RPM file format, then we need
a proper discussion on rpm-maint@ and rpm-ecosystem@ mailing lists on
rpm.org. Can you please start a targeted discussion there?

But addressing the specific concrete suggestion here, there's a few
concerns I have:

1. This is a huge format break, which means that for the first time in
a _very_ long time, it would not be possible to reuse RHEL for Fedora
infrastructure _at all_. That's going to be a difficult problem.
There's a large legacy of systems that won't be able to handle that
new format, and unfortunately, rpm is not parallel installable in the
same manner as something like GCC or Python currently. Making it
parallel installable *is* possible (I've done it, and there have been
other attempts before), but it's not a supported thing. This is
probably the thing that would trigger a major version bump for RPM,
since it's a new archive format.

2. This also means the _entire_ ecosystem of RPM archive parsers will
break. This is not particularly insurmountable, actually, as the RPM
file format was not particularly well documented, and a new format is
an opportunity to revisit some of those old issues and try to do a
better job this go around. But it's still a challenge to deal with.

3. When you refer to the rpm cpio, I assume you're referring to only
the archive payload, right? Typically the payload is what is
compressed, and the headers are not. It sounds like you're proposing
both aspects to be compressed, and compressed differently. If we made
the RPM header an uncompressed zchunk stream and the RPM payload a
zstd-compressed zchunk stream, would we be able to support fetching
header deltas for retrieving extra information on the fly? Say, for
example, attributes like arch color, filecap properties, and so on,
that aren't in the rpm-md data for things like transaction tests
without the whole RPM?

4. I'd actually rather make it easier for the header streams to be
fetched instead of trying to make specific attributes easier in the
header payload. History has shown that any attempt at foresight here
tends to fail miserably, and common attributes are already specified
in the rpm-md primary.xml anyway, so if you're fetching the header to
retrieve an attribute, you *need* to do something weird anyway.

5. I'm not exactly sure what you mean by zchunk signing...

6. I'm wondering why we can't do a perfect reconstruction of the
original RPM, given two RPM sources that are both zchunked? We can
pull it off with repodata, so what's different about RPM that makes
that not doable?


--
真実はいつも一つ!/ Always, there's only one truth!
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@
Jonathan Dieter
2018-11-17 17:24:30 UTC
Permalink
Neal, thanks so much for your thoughts on this. Responses inline:

On Sat, 2018-11-17 at 09:53 -0500, Neal Gompa wrote:
<snip>
Post by Neal Gompa
If we're really considering changing the RPM file format, then we need
rpm.org. Can you please start a targeted discussion there?
Sure.
Post by Neal Gompa
But addressing the specific concrete suggestion here, there's a few
1. This is a huge format break, which means that for the first time in
a _very_ long time, it would not be possible to reuse RHEL for Fedora
infrastructure _at all_. That's going to be a difficult problem.
There's a large legacy of systems that won't be able to handle that
new format, and unfortunately, rpm is not parallel installable in the
same manner as something like GCC or Python currently. Making it
parallel installable *is* possible (I've done it, and there have been
other attempts before), but it's not a supported thing. This is
probably the thing that would trigger a major version bump for RPM,
since it's a new archive format.
Agreed, that this would be a massive format change and should therefore
be a major version bump for RPM. New versions of RPM should still be
able to read and install old-format rpms, but, as you point out, old
versions of RPM won't be able to read or install new-format rpms.
Unfortunately, I don't see any way around this.
Post by Neal Gompa
2. This also means the _entire_ ecosystem of RPM archive parsers will
break. This is not particularly insurmountable, actually, as the RPM
file format was not particularly well documented, and a new format is
an opportunity to revisit some of those old issues and try to do a
better job this go around. But it's still a challenge to deal with.
Yes, this is going to be quite a bit of work.
Post by Neal Gompa
3. When you refer to the rpm cpio, I assume you're referring to only
the archive payload, right? Typically the payload is what is
compressed, and the headers are not. It sounds like you're proposing
both aspects to be compressed, and compressed differently. If we made
the RPM header an uncompressed zchunk stream and the RPM payload a
zstd-compressed zchunk stream, would we be able to support fetching
header deltas for retrieving extra information on the fly? Say, for
example, attributes like arch color, filecap properties, and so on,
that aren't in the rpm-md data for things like transaction tests
without the whole RPM?
Yes, I'm referring the the archive payload as the cpio. The zchunk
format supports the idea of separate data streams, and I was planning
to use that to put the headers in one stream and the archive payload in
another. If the header chunks are first in the zchunk file, then they
could be read without needing to read any of the rest of the file.
And, yes, we could make the header stream uncompressed if that made it
easier to parse.
Post by Neal Gompa
4. I'd actually rather make it easier for the header streams to be
fetched instead of trying to make specific attributes easier in the
header payload. History has shown that any attempt at foresight here
tends to fail miserably, and common attributes are already specified
in the rpm-md primary.xml anyway, so if you're fetching the header to
retrieve an attribute, you *need* to do something weird anyway.
The main purpose of putting separate attributes in the zchunk header is
so programs like 'file' can determine some basic information about an
rpm without needing to parse the full rpm header. This data would also
be in the rpm header, so programs that read the rpm header wouldn't
care about the attributes in the zchunk header.
Post by Neal Gompa
5. I'm not exactly sure what you mean by zchunk signing...
The zchunk format supports signing, but just for the zchunk header.
Because the header contains the checksums for each chunk, this
establishes a chain of trust for verifying the whole file. Which
brings me to...
Post by Neal Gompa
6. I'm wondering why we can't do a perfect reconstruction of the
original RPM, given two RPM sources that are both zchunked? We can
pull it off with repodata, so what's different about RPM that makes
that not doable?
The problem is that, unlike the repodata, once an rpm is installed, the
package file is deleted and the data is only available on the system in
its uncompressed installed form. If we're trying to use that data to
rebuild an rpm, we have two options.

1. Compress the data using the same method that was used to create the
original rpm. This is what applydeltarpm does, and is why it's so
heavy on the CPU.
2. Store the data uncompressed in the rebuilt rpm. This isn't feasible
with deltarpm, but, if we store both compressed hashes and
uncompressed hashes in the zchunk header, we can do this in zchunk.
When running checking the signature, zchunk verifies the header
against the signature first, and then checks each chunk to see if it
passes *either* the compressed or uncompressed signature check.

I hope this makes my thought process on this part clearer.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraprojec
Neal Gompa
2018-11-17 19:36:49 UTC
Permalink
Post by Adam Williamson
<snip>
Post by Neal Gompa
If we're really considering changing the RPM file format, then we need
rpm.org. Can you please start a targeted discussion there?
Sure.
Post by Neal Gompa
But addressing the specific concrete suggestion here, there's a few
1. This is a huge format break, which means that for the first time in
a _very_ long time, it would not be possible to reuse RHEL for Fedora
infrastructure _at all_. That's going to be a difficult problem.
There's a large legacy of systems that won't be able to handle that
new format, and unfortunately, rpm is not parallel installable in the
same manner as something like GCC or Python currently. Making it
parallel installable *is* possible (I've done it, and there have been
other attempts before), but it's not a supported thing. This is
probably the thing that would trigger a major version bump for RPM,
since it's a new archive format.
Agreed, that this would be a massive format change and should therefore
be a major version bump for RPM. New versions of RPM should still be
able to read and install old-format rpms, but, as you point out, old
versions of RPM won't be able to read or install new-format rpms.
Unfortunately, I don't see any way around this.
I don't think there's a way around it either. I just hope we do better
than the last time someone tried to do this...
Post by Adam Williamson
Post by Neal Gompa
2. This also means the _entire_ ecosystem of RPM archive parsers will
break. This is not particularly insurmountable, actually, as the RPM
file format was not particularly well documented, and a new format is
an opportunity to revisit some of those old issues and try to do a
better job this go around. But it's still a challenge to deal with.
Yes, this is going to be quite a bit of work.
Post by Neal Gompa
3. When you refer to the rpm cpio, I assume you're referring to only
the archive payload, right? Typically the payload is what is
compressed, and the headers are not. It sounds like you're proposing
both aspects to be compressed, and compressed differently. If we made
the RPM header an uncompressed zchunk stream and the RPM payload a
zstd-compressed zchunk stream, would we be able to support fetching
header deltas for retrieving extra information on the fly? Say, for
example, attributes like arch color, filecap properties, and so on,
that aren't in the rpm-md data for things like transaction tests
without the whole RPM?
Yes, I'm referring the the archive payload as the cpio. The zchunk
format supports the idea of separate data streams, and I was planning
to use that to put the headers in one stream and the archive payload in
another. If the header chunks are first in the zchunk file, then they
could be read without needing to read any of the rest of the file.
And, yes, we could make the header stream uncompressed if that made it
easier to parse.
Whether it's compressed or not isn't terribly important, but what is
important is being able to validate the correctness before beginning
any processing, including decompression.
Post by Adam Williamson
Post by Neal Gompa
4. I'd actually rather make it easier for the header streams to be
fetched instead of trying to make specific attributes easier in the
header payload. History has shown that any attempt at foresight here
tends to fail miserably, and common attributes are already specified
in the rpm-md primary.xml anyway, so if you're fetching the header to
retrieve an attribute, you *need* to do something weird anyway.
The main purpose of putting separate attributes in the zchunk header is
so programs like 'file' can determine some basic information about an
rpm without needing to parse the full rpm header. This data would also
be in the rpm header, so programs that read the rpm header wouldn't
care about the attributes in the zchunk header.
I see, so some simple hints for stuff like that? But that would still
require awareness of the format to some degree. I guess we'd have a
specific lead magic to let tools know to look for them...
Post by Adam Williamson
Post by Neal Gompa
5. I'm not exactly sure what you mean by zchunk signing...
The zchunk format supports signing, but just for the zchunk header.
Because the header contains the checksums for each chunk, this
establishes a chain of trust for verifying the whole file. Which
brings me to...
Post by Neal Gompa
6. I'm wondering why we can't do a perfect reconstruction of the
original RPM, given two RPM sources that are both zchunked? We can
pull it off with repodata, so what's different about RPM that makes
that not doable?
The problem is that, unlike the repodata, once an rpm is installed, the
package file is deleted and the data is only available on the system in
its uncompressed installed form. If we're trying to use that data to
rebuild an rpm, we have two options.
1. Compress the data using the same method that was used to create the
original rpm. This is what applydeltarpm does, and is why it's so
heavy on the CPU.
2. Store the data uncompressed in the rebuilt rpm. This isn't feasible
with deltarpm, but, if we store both compressed hashes and
uncompressed hashes in the zchunk header, we can do this in zchunk.
When running checking the signature, zchunk verifies the header
against the signature first, and then checks each chunk to see if it
passes *either* the compressed or uncompressed signature check.
I hope this makes my thought process on this part clearer.
Yeah, that makes sense...


--
真実はいつも一つ!/ Always, there's only one truth!
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraprojec
Jonathan Dieter
2018-11-17 21:49:06 UTC
Permalink
Post by Neal Gompa
Post by Adam Williamson
<snip>
Post by Neal Gompa
If we're really considering changing the RPM file format, then we need
rpm.org. Can you please start a targeted discussion there?
Sure.
Post by Neal Gompa
But addressing the specific concrete suggestion here, there's a few
1. This is a huge format break, which means that for the first time in
a _very_ long time, it would not be possible to reuse RHEL for Fedora
infrastructure _at all_. That's going to be a difficult problem.
There's a large legacy of systems that won't be able to handle that
new format, and unfortunately, rpm is not parallel installable in the
same manner as something like GCC or Python currently. Making it
parallel installable *is* possible (I've done it, and there have been
other attempts before), but it's not a supported thing. This is
probably the thing that would trigger a major version bump for RPM,
since it's a new archive format.
Agreed, that this would be a massive format change and should therefore
be a major version bump for RPM. New versions of RPM should still be
able to read and install old-format rpms, but, as you point out, old
versions of RPM won't be able to read or install new-format rpms.
Unfortunately, I don't see any way around this.
I don't think there's a way around it either. I just hope we do better
than the last time someone tried to do this...
+1
Post by Neal Gompa
Post by Adam Williamson
Post by Neal Gompa
2. This also means the _entire_ ecosystem of RPM archive parsers will
break. This is not particularly insurmountable, actually, as the RPM
file format was not particularly well documented, and a new format is
an opportunity to revisit some of those old issues and try to do a
better job this go around. But it's still a challenge to deal with.
Yes, this is going to be quite a bit of work.
Post by Neal Gompa
3. When you refer to the rpm cpio, I assume you're referring to only
the archive payload, right? Typically the payload is what is
compressed, and the headers are not. It sounds like you're proposing
both aspects to be compressed, and compressed differently. If we made
the RPM header an uncompressed zchunk stream and the RPM payload a
zstd-compressed zchunk stream, would we be able to support fetching
header deltas for retrieving extra information on the fly? Say, for
example, attributes like arch color, filecap properties, and so on,
that aren't in the rpm-md data for things like transaction tests
without the whole RPM?
Yes, I'm referring the the archive payload as the cpio. The zchunk
format supports the idea of separate data streams, and I was planning
to use that to put the headers in one stream and the archive payload in
another. If the header chunks are first in the zchunk file, then they
could be read without needing to read any of the rest of the file.
And, yes, we could make the header stream uncompressed if that made it
easier to parse.
Whether it's compressed or not isn't terribly important, but what is
important is being able to validate the correctness before beginning
any processing, including decompression.
Absolutely! This includes both the rpm header and the rpm archive
data, and that's why we store both the compressed and uncompressed
checksums of the chunks.
Post by Neal Gompa
Post by Adam Williamson
Post by Neal Gompa
4. I'd actually rather make it easier for the header streams to be
fetched instead of trying to make specific attributes easier in the
header payload. History has shown that any attempt at foresight here
tends to fail miserably, and common attributes are already specified
in the rpm-md primary.xml anyway, so if you're fetching the header to
retrieve an attribute, you *need* to do something weird anyway.
The main purpose of putting separate attributes in the zchunk header is
so programs like 'file' can determine some basic information about an
rpm without needing to parse the full rpm header. This data would also
be in the rpm header, so programs that read the rpm header wouldn't
care about the attributes in the zchunk header.
I see, so some simple hints for stuff like that? But that would still
require awareness of the format to some degree. I guess we'd have a
specific lead magic to let tools know to look for them...
Yeah, the code would be maybe a hundred lines, max, that could be
copylib'd into file, etc.
Post by Neal Gompa
Post by Adam Williamson
Post by Neal Gompa
5. I'm not exactly sure what you mean by zchunk signing...
The zchunk format supports signing, but just for the zchunk header.
Because the header contains the checksums for each chunk, this
establishes a chain of trust for verifying the whole file. Which
brings me to...
Post by Neal Gompa
6. I'm wondering why we can't do a perfect reconstruction of the
original RPM, given two RPM sources that are both zchunked? We can
pull it off with repodata, so what's different about RPM that makes
that not doable?
The problem is that, unlike the repodata, once an rpm is installed, the
package file is deleted and the data is only available on the system in
its uncompressed installed form. If we're trying to use that data to
rebuild an rpm, we have two options.
1. Compress the data using the same method that was used to create the
original rpm. This is what applydeltarpm does, and is why it's so
heavy on the CPU.
2. Store the data uncompressed in the rebuilt rpm. This isn't feasible
with deltarpm, but, if we store both compressed hashes and
uncompressed hashes in the zchunk header, we can do this in zchunk.
When running checking the signature, zchunk verifies the header
against the signature first, and then checks each chunk to see if it
passes *either* the compressed or uncompressed signature check.
I hope this makes my thought process on this part clearer.
Yeah, that makes sense...
Great! Thanks again for looking at this.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/arc
Colin Walters
2018-11-21 16:31:06 UTC
Permalink
Post by Jonathan Dieter
Agreed, that this would be a massive format change and should therefore
be a major version bump for RPM. New versions of RPM should still be
able to read and install old-format rpms, but, as you point out, old
versions of RPM won't be able to read or install new-format rpms.
Unfortunately, I don't see any way around this.
After having introduced a new format (OSTree) into the ecosystem here,
as well as working a lot on the Docker/OCI ecosystem, one thing I want
to emphasize is:

A lot of Red Hat's customers don't connect their systems to the Internet,
they want easy offline mirroring. OSTree supports that, and it's
also possible to do with OCI images of course.

But, a lot organizations use e.g. https://jfrog.com/artifactory/
which today doesn't support OSTree (it does support RPM and Docker/OCI).
So any format break for RPM wouldn't be usable until Artifactory gains
support for it. And even after that happened you'd have in some
places a large lag time for it to be deployed.

In general, any data format break is going to impose a lot higher
costs than you might imagine.

(Also on this topic, I should note that the OSTree data format cleanly
fixes a lot of the issues being discussed here; it has deltas, and also
doesn't make the mistake of checksumming compressed data,
when performing updates only changed files are rewritten, not to mention
a whole transactional update system, etc.)
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/l
Jonathan Dieter
2018-11-21 20:36:56 UTC
Permalink
Post by Colin Walters
After having introduced a new format (OSTree) into the ecosystem here,
as well as working a lot on the Docker/OCI ecosystem, one thing I want
A lot of Red Hat's customers don't connect their systems to the Internet,
they want easy offline mirroring. OSTree supports that, and it's
also possible to do with OCI images of course.
But, a lot organizations use e.g. https://jfrog.com/artifactory/
which today doesn't support OSTree (it does support RPM and Docker/OCI).
So any format break for RPM wouldn't be usable until Artifactory gains
support for it. And even after that happened you'd have in some
places a large lag time for it to be deployed.
In general, any data format break is going to impose a lot higher
costs than you might imagine.
Thanks for bringing up these points. You are undoubtedly correct that
there's an unknown cost associated with these changes, but hopefully
the cost will become a little clearer once we have a POC.
Post by Colin Walters
(Also on this topic, I should note that the OSTree data format cleanly
fixes a lot of the issues being discussed here; it has deltas, and also
doesn't make the mistake of checksumming compressed data,
when performing updates only changed files are rewritten, not to mention
a whole transactional update system, etc.)
Yep. I've experimented with OSTree and love the concepts behind it. I
don't think we're quite ready to ditch the classic rpm systems yet,
though.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archive
Josh Boyer
2018-11-22 16:31:03 UTC
Permalink
Post by Jonathan Dieter
Post by Colin Walters
After having introduced a new format (OSTree) into the ecosystem here,
as well as working a lot on the Docker/OCI ecosystem, one thing I want
A lot of Red Hat's customers don't connect their systems to the Internet,
they want easy offline mirroring. OSTree supports that, and it's
also possible to do with OCI images of course.
But, a lot organizations use e.g. https://jfrog.com/artifactory/
which today doesn't support OSTree (it does support RPM and Docker/OCI).
So any format break for RPM wouldn't be usable until Artifactory gains
support for it. And even after that happened you'd have in some
places a large lag time for it to be deployed.
In general, any data format break is going to impose a lot higher
costs than you might imagine.
Thanks for bringing up these points. You are undoubtedly correct that
there's an unknown cost associated with these changes, but hopefully
the cost will become a little clearer once we have a POC.
I'm concerned that this will effectively render EL RPM unable to
handle any Fedora RPMs at all. That's both a practical concern, as
many people develop Fedora using EL and vice versa, and also a broader
ecosystem concern. I would very much like for all of our
distributions to be able to more easily operate together, and this
effectively forks Fedora off into it's own space yet again.

Have we really looked at the wider scope of what a format change like
this would do in the context of some of the larger picture things
we're working on with lifecycle and cross-distro collaboration
efforts? I agree this would be better than delta RPMs when looking at
that *specific* usecase, but improving that (even with compose time
benefits) by doing a format change seems to be inflicting a very high
cost for what is an important but relatively small usecase.

josh
Post by Jonathan Dieter
Post by Colin Walters
(Also on this topic, I should note that the OSTree data format cleanly
fixes a lot of the issues being discussed here; it has deltas, and also
doesn't make the mistake of checksumming compressed data,
when performing updates only changed files are rewritten, not to mention
a whole transactional update system, etc.)
Yep. I've experimented with OSTree and love the concepts behind it. I
don't think we're quite ready to ditch the classic rpm systems yet,
though.
Jonathan
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.or
Jonathan Dieter
2018-11-22 21:21:55 UTC
Permalink
Post by Josh Boyer
I'm concerned that this will effectively render EL RPM unable to
handle any Fedora RPMs at all. That's both a practical concern, as
many people develop Fedora using EL and vice versa, and also a broader
ecosystem concern. I would very much like for all of our
distributions to be able to more easily operate together, and this
effectively forks Fedora off into it's own space yet again.
For what very little it's worth, a zchunked rpm is still a valid zchunk
file, and you would be able to easily extract the payload and the
header from a zchunked rpm on an EL system. In fact, because we're not
planning to change the rpm header format at all, there could easily be
a tool to convert a zchunked rpm into an xz or gzipped rpm (and vice
versa). Having said that, there's a huge difference between this and
having EL's RPM actually be able to read Fedora's RPMs.
Post by Josh Boyer
Have we really looked at the wider scope of what a format change like
this would do in the context of some of the larger picture things
we're working on with lifecycle and cross-distro collaboration
efforts? I agree this would be better than delta RPMs when looking at
that *specific* usecase, but improving that (even with compose time
benefits) by doing a format change seems to be inflicting a very high
cost for what is an important but relatively small usecase.
This is a really good point and I don't know know the answer. As per
Neal's suggestion earlier in the thread, I posted this to the rpm-
ecosystem mailing list. Michael from SUSE brought up some very valid
concerns about how well zchunk would compare with deltarpm in delta
efficiency, but apart from that, it's been pretty quiet there.

If it's already obvious that the cost for this proposal isn't worth the
gain, I do completely understand. If we're not sure, I'll do a proof-
of-concept that converts standard rpms into zchunked rpms, so we can
compare sizes and deltas, and hopefully get some data points on speed.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/
Chris Adams
2018-11-17 20:43:23 UTC
Permalink
Post by Jonathan Dieter
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.
How well do web servers and caches handle range requests? I haven't
really paid attention to range requests in a long time; at one point
IIRC mirrors would often disable them because of "download accelerators"
that would open multiple connections to download parts of the same ISO
in parallel (hogging server resources).
--
Chris Adams <***@cmadams.net>
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel
Jonathan Dieter
2018-11-17 22:00:35 UTC
Permalink
Post by Chris Adams
Post by Jonathan Dieter
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.
How well do web servers and caches handle range requests? I haven't
really paid attention to range requests in a long time; at one point
IIRC mirrors would often disable them because of "download accelerators"
that would open multiple connections to download parts of the same ISO
in parallel (hogging server resources).
When I did the original POC testing, out of Fedora's 150 mirrors, 3
didn't support range requests at all and 3 supported a limited number
of ranges in a single http request.

Zchunk doesn't open extra connections to the server, but instead
combines as many ranges as the server supports into a single request.
Currently, if a server doesn't support ranges, zchunk will just
download the full file, but this could be changed to try a different
server.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel
Gerd Hoffmann
2018-11-19 09:31:26 UTC
Permalink
Post by Chris Adams
Post by Jonathan Dieter
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.
How well do web servers and caches handle range requests?
web servers: not much of a problem.

caches: doesn't look so good. squid can't store chunks of a file. So
the options you have are:

(a) configure squid to pass through range requests to the original
server. Which kills any caching.
(b) configure squid to fetch the whole file no matter what.
Subsequent requests (including range requests) will be served
from cache then, but zchunked rpms wouldn't save any bandwidth
then.

cheers,
Gerd
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorap
Nicolas Mailhot
2018-11-19 10:08:47 UTC
Permalink
Post by Gerd Hoffmann
caches: doesn't look so good. squid can't store chunks of a file.
For range request caching you probably want apache traffic server (ATS),
not squid, as cache layer.

The video guys implemented lots if range request magic in the cache tech
they use.

--
Nicolas Mailhot
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproje
Kevin Kofler
2018-11-17 21:30:14 UTC
Permalink
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
Given that:
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
which matters:
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.

I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.

Kevin Kofler
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraprojec
Jonathan Dieter
2018-11-18 16:02:58 UTC
Permalink
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.

As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@l
Neal Gompa
2018-11-18 16:57:51 UTC
Permalink
Post by Jonathan Dieter
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.
As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.
I did some work several months ago to evaluate zstd compression for
RPMs for Fedora, because of the lower memory and CPU usage for
(de)compression. However, the average size increase from xz was pretty
large (~20% or more on average, and nothing ever was either the same
or smaller), even with heavier compression settings. That might have
changed a bit with newer zstd releases that offer some more tunables,
but I think it'll remain a tough sell on disk space.


--
真実はいつも一つ!/ Always, there's only one truth!
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedorapro
Stephen John Smoogen
2018-11-18 18:19:08 UTC
Permalink
Post by Neal Gompa
Post by Jonathan Dieter
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.
As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.
I did some work several months ago to evaluate zstd compression for
RPMs for Fedora, because of the lower memory and CPU usage for
(de)compression. However, the average size increase from xz was pretty
large (~20% or more on average, and nothing ever was either the same
or smaller), even with heavier compression settings. That might have
changed a bit with newer zstd releases that offer some more tunables,
but I think it'll remain a tough sell on disk space.
So there are at least 4 legs here:
CPU usage (in both uncompression install and deltarpm)
Memory usage per transaction
Network amount
Disk amount

I expect that the best we are going to get in any 'improvement' is
going to be 3 out of the 4. The xz compression and delta-rpm has a
cpu/memory tradeoff for disk and network in comparison to gzip but it
is mostly acceptable if you have fairly modern desktops. However for
older hardware or lower power systems that tradeoff may not be good.



--
Stephen J Smoogen.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.or
Sheogorath
2018-11-19 10:41:59 UTC
Permalink
Post by Stephen John Smoogen
Post by Neal Gompa
Post by Jonathan Dieter
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.
As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.
I did some work several months ago to evaluate zstd compression for
RPMs for Fedora, because of the lower memory and CPU usage for
(de)compression. However, the average size increase from xz was pretty
large (~20% or more on average, and nothing ever was either the same
or smaller), even with heavier compression settings. That might have
changed a bit with newer zstd releases that offer some more tunables,
but I think it'll remain a tough sell on disk space.
CPU usage (in both uncompression install and deltarpm)
Memory usage per transaction
Network amount
Disk amount
I expect that the best we are going to get in any 'improvement' is
going to be 3 out of the 4. The xz compression and delta-rpm has a
cpu/memory tradeoff for disk and network in comparison to gzip but it
is mostly acceptable if you have fairly modern desktops. However for
older hardware or lower power systems that tradeoff may not be good.
Good point. Given that we start to have Fedora IoT we have to look at
those creatures. IoT devices hate heavy RAM usage, hate disk usage are
half way okay with CPU usage (but keep in mind it may take an hour to
decompress) and depending on the upstream, either use mobile data for
networking or when you're lucky some WiFi/Bluetooth/
 thing.

Means:
CPU usage: Getting worse here, doesn't hurt too much
Memory usage: Don't! Get! Worse!
Network amount: Well, people wouldn't be happy when it gets worse, but
mobile data gets cheaper every day.
Disk amount: People won't be happy with an increase here, but as long as
it stays somewhere within 10% it's fine, more than 20% would already
hurt a lot.

So when we want to revisit RPM, we should keep our new fellows in mind.
Maybe we get some OSTree magic going? There we already see deltas
between versions and we get chunks.
--
Signed
Sheogorath
Martin Kolman
2018-11-19 11:28:56 UTC
Permalink
Post by Sheogorath
Post by Stephen John Smoogen
Post by Neal Gompa
Post by Jonathan Dieter
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.
As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.
I did some work several months ago to evaluate zstd compression for
RPMs for Fedora, because of the lower memory and CPU usage for
(de)compression. However, the average size increase from xz was pretty
large (~20% or more on average, and nothing ever was either the same
or smaller), even with heavier compression settings. That might have
changed a bit with newer zstd releases that offer some more tunables,
but I think it'll remain a tough sell on disk space.
CPU usage (in both uncompression install and deltarpm)
Memory usage per transaction
Network amount
Disk amount
I expect that the best we are going to get in any 'improvement' is
going to be 3 out of the 4. The xz compression and delta-rpm has a
cpu/memory tradeoff for disk and network in comparison to gzip but it
is mostly acceptable if you have fairly modern desktops. However for
older hardware or lower power systems that tradeoff may not be good.
Good point. Given that we start to have Fedora IoT we have to look at
those creatures. IoT devices hate heavy RAM usage, hate disk usage are
half way okay with CPU usage (but keep in mind it may take an hour to
decompress) and depending on the upstream, either use mobile data for
networking or when you're lucky some WiFi/Bluetooth/… thing.
CPU usage: Getting worse here, doesn't hurt too much
Memory usage: Don't! Get! Worse!
Network amount: Well, people wouldn't be happy when it gets worse, but
mobile data gets cheaper every day.
Disk amount: People won't be happy with an increase here, but as long as
it stays somewhere within 10% it's fine, more than 20% would already
hurt a lot.
I'd like to add another perspective - network installations.

During a network installation of Fedora, the installer can only use the available RAM
to both run and store and data it needs before starting the installation.
Only after the installation is startedf and storage is partitioned, it can
(if the system has a swap partition) relieve some of the memory pressure to
persistent storage.

Many people might think RAM would not be an issue in 2018, but in practice there are
and likely always will be memory constrained installation targets, such as massive deployments
of "small" VMs or the IoT use cases mentioned above.

So even though a network installation process is unlikely to actually do
delta package reconstructions against an older version, it would bee good
if memory requirements just for simple non-delta download, verification and
package installation would remain sane.
Post by Sheogorath
So when we want to revisit RPM, we should keep our new fellows in mind.
Maybe we get some OSTree magic going? There we already see deltas
between versions and we get chunks.
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.org
Nicolas Mailhot
2018-11-19 12:13:50 UTC
Permalink
Post by Martin Kolman
Many people might think RAM would not be an issue in 2018, but in practice there are
and likely always will be memory constrained installation targets,
such as massive deployments
of "small" VMs or the IoT use cases mentioned above.
Sure, that’s the artificial small vm case

The average old/limited hardware is limited in memory, cpu and storage.
Therefore if you have one factor to sacrifice it's cpu time because you
can always let the CPU run a little longer, but a limited system won't
magically grow more memory or more storage.

Storage would not be such a problem is dnf was smart enough to auto
partition big upgrades in lots of small partial upgrades, before
downloading gigs of data that do not fit on disk.

Regards,

--
Nicolas Mailhot
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archi
Jan Pokorný
2018-11-19 19:16:43 UTC
Permalink
Post by Martin Kolman
Many people might think RAM would not be an issue in 2018, but in practice there are
and likely always will be memory constrained installation targets,
such as massive deployments
of "small" VMs or the IoT use cases mentioned above.
Sure, that’s the artificial small vm case
The average old/limited hardware is limited in memory, cpu and storage.
Therefore if you have one factor to sacrifice it's cpu time because you can
always let the CPU run a little longer, but a limited system won't magically
grow more memory or more storage.
Storage would not be such a problem is dnf was smart enough to auto
partition big upgrades in lots of small partial upgrades, before downloading
gigs of data that do not fit on disk.
https://bugzilla.redhat.com/show_bug.cgi?id=1609824

Also, not familiar with zchunk way of doing things, but couldn't
rpm-integrity-verified installed files be mapped back to "chunks"
to further aleviate space concerns for the machine receiving
updates in some cases?
--
Jan (Poki)
Jonathan Dieter
2018-11-19 19:58:39 UTC
Permalink
Post by Jan Pokorný
Post by Martin Kolman
Many people might think RAM would not be an issue in 2018, but in
practice there are
and likely always will be memory constrained installation targets,
such as massive deployments
of "small" VMs or the IoT use cases mentioned above.
Sure, that’s the artificial small vm case
The average old/limited hardware is limited in memory, cpu and storage.
Therefore if you have one factor to sacrifice it's cpu time because you can
always let the CPU run a little longer, but a limited system won't magically
grow more memory or more storage.
Storage would not be such a problem is dnf was smart enough to auto
partition big upgrades in lots of small partial upgrades, before downloading
gigs of data that do not fit on disk.
https://bugzilla.redhat.com/show_bug.cgi?id=1609824
Also, not familiar with zchunk way of doing things, but couldn't
rpm-integrity-verified installed files be mapped back to "chunks"
to further aleviate space concerns for the machine receiving
updates in some cases?
That's an interesting thought. I was picturing using the zchunk
library in the dnf download stage to build a local rpm from the
verified locally installed files and the downloaded changed chunks,
but, if I understand your suggestions correctly, you're saying we could
just download the changed chunks and have RPM automatically get the
rpm-integrity verified chunks during the *install* stage.

The advantage of this method is that you don't need to store the local
data twice, but the danger is that the local files get changed
elsewhere during the install process.

It's an interesting thought, though, and I wonder if there's a way we
could work around that danger?

Jonathan
Simo Sorce
2018-11-19 20:18:11 UTC
Permalink
Post by Jonathan Dieter
Post by Jan Pokorný
Post by Nicolas Mailhot
Post by Martin Kolman
Many people might think RAM would not be an issue in 2018, but in
practice there are
and likely always will be memory constrained installation targets,
such as massive deployments
of "small" VMs or the IoT use cases mentioned above.
Sure, that’s the artificial small vm case
The average old/limited hardware is limited in memory, cpu and storage.
Therefore if you have one factor to sacrifice it's cpu time because you can
always let the CPU run a little longer, but a limited system won't magically
grow more memory or more storage.
Storage would not be such a problem is dnf was smart enough to auto
partition big upgrades in lots of small partial upgrades, before downloading
gigs of data that do not fit on disk.
https://bugzilla.redhat.com/show_bug.cgi?id=1609824
Also, not familiar with zchunk way of doing things, but couldn't
rpm-integrity-verified installed files be mapped back to "chunks"
to further aleviate space concerns for the machine receiving
updates in some cases?
That's an interesting thought. I was picturing using the zchunk
library in the dnf download stage to build a local rpm from the
verified locally installed files and the downloaded changed chunks,
but, if I understand your suggestions correctly, you're saying we could
just download the changed chunks and have RPM automatically get the
rpm-integrity verified chunks during the *install* stage.
How do you know which chunks to download w/o having a stored (or
recomputed) list of existing chunks ?
Post by Jonathan Dieter
The advantage of this method is that you don't need to store the local
data twice, but the danger is that the local files get changed
elsewhere during the install process.
It's an interesting thought, though, and I wonder if there's a way we
could work around that danger?
I do not think you can just trust random metadata somewhere, one of the
points of a rpm reinstall is to fix damaged files for example. It does
no good if you skip those because some file somewhere says they are
"OK". (If I understood your comment about "just downloading changed
chunks).

A couple more questions.
I skimmed quickly at the format and I have two questions I did not
immediately see an answer for.
1) why are you still supporting SHA-1 in a new format ?
2) what are the chunks sizes ?

Sorry if this is already answered somewhere.

Finally what signature scheme where you planning to use ? And how do
you deal with the data you want to "exclude" from signing, omit it or
feed in blank "sectors" ?

Thanks for any answer.
Simo.

--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraprojec
Jonathan Dieter
2018-11-19 21:02:40 UTC
Permalink
<snip>
Post by Simo Sorce
Post by Jonathan Dieter
That's an interesting thought. I was picturing using the zchunk
library in the dnf download stage to build a local rpm from the
verified locally installed files and the downloaded changed chunks,
but, if I understand your suggestions correctly, you're saying we could
just download the changed chunks and have RPM automatically get the
rpm-integrity verified chunks during the *install* stage.
How do you know which chunks to download w/o having a stored (or
recomputed) list of existing chunks ?
I thought we should store the chunk checksums of installed files in the
rpm database. Something like file, offset, length, checksum type,
checksum?
Post by Simo Sorce
Post by Jonathan Dieter
The advantage of this method is that you don't need to store the local
data twice, but the danger is that the local files get changed
elsewhere during the install process.
It's an interesting thought, though, and I wonder if there's a way we
could work around that danger?
I do not think you can just trust random metadata somewhere, one of the
points of a rpm reinstall is to fix damaged files for example. It does
no good if you skip those because some file somewhere says they are
"OK". (If I understood your comment about "just downloading changed
chunks).
Yes, this is the crux of the problem. As I see it, dnf should verify
the checksums on the local files before downloading the missing chunks,
but that doesn't guarantee that the data won't be changed between the
download step and the install step. RPM would also need to verify the
checksums before starting the install phase, and would need to bail out
if the checksums had changed.

My biggest concern, though, is what happens if package A needs a
specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while
being installed. The chunk was there when the install phase started,
but disappeared before package A was actually installed.
Post by Simo Sorce
A couple more questions.
I skimmed quickly at the format and I have two questions I did not
immediately see an answer for.
1) why are you still supporting SHA-1 in a new format ?
Zchunk cares about two types of checksums, the chunk checksums, used to
determine if two chunks are the same, and the full data checksum (which
currently defaults to SHA-25), used to actually validate the data.

Originally, SHA-1 was supposed to be used *only* for the chunk
checksums, but, somewhere along the way, it was pointed out that using
the first 128 bits of a SHA-512 hash would be faster and more secure,
so the default for the chunk checksums is now SHA-512/128.

The only reason SHA-1 support is still in zchunk is because I don't
want to break backwards compatibility for the (probably five) zchunk
files created before this change.

Having said that, zchunked rpms won't be able to depend on the full
data checksum (because the local chunks will be uncompressed), so we'd
need to use SHA-256 at minimum for the chunk checksums.
Post by Simo Sorce
2) what are the chunks sizes ?
The chunk sizes vary because you don't want inserting or removing a few
bytes to completely change all the following chunks. The current
default average size is 32KB, but that can be adjusted.
Post by Simo Sorce
Sorry if this is already answered somewhere.
Finally what signature scheme where you planning to use ? And how do
you deal with the data you want to "exclude" from signing, omit it or
feed in blank "sectors" ?
I was planning to use GPG signatures, and was planning to just omit the
data I want excluded. Having said that, while the format supports
signatures, the code hasn't been written and if either of those answers
are bad/dangerous, we can change that.
Post by Simo Sorce
Thanks for any answer.
Simo.
Thank you for looking at this!

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.
Simo Sorce
2018-11-19 21:29:55 UTC
Permalink
Post by Adam Williamson
<snip>
Post by Simo Sorce
Post by Jonathan Dieter
That's an interesting thought. I was picturing using the zchunk
library in the dnf download stage to build a local rpm from the
verified locally installed files and the downloaded changed chunks,
but, if I understand your suggestions correctly, you're saying we could
just download the changed chunks and have RPM automatically get the
rpm-integrity verified chunks during the *install* stage.
How do you know which chunks to download w/o having a stored (or
recomputed) list of existing chunks ?
I thought we should store the chunk checksums of installed files in the
rpm database. Something like file, offset, length, checksum type,
checksum?
Post by Simo Sorce
Post by Jonathan Dieter
The advantage of this method is that you don't need to store the local
data twice, but the danger is that the local files get changed
elsewhere during the install process.
It's an interesting thought, though, and I wonder if there's a way we
could work around that danger?
I do not think you can just trust random metadata somewhere, one of the
points of a rpm reinstall is to fix damaged files for example. It does
no good if you skip those because some file somewhere says they are
"OK". (If I understood your comment about "just downloading changed
chunks).
Yes, this is the crux of the problem. As I see it, dnf should verify
the checksums on the local files before downloading the missing chunks,
but that doesn't guarantee that the data won't be changed between the
download step and the install step. RPM would also need to verify the
checksums before starting the install phase, and would need to bail out
if the checksums had changed.
My biggest concern, though, is what happens if package A needs a
specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while
being installed. The chunk was there when the install phase started,
but disappeared before package A was actually installed.
Is this different in a normal install ?
What if package A installs /usr/bin/foo and then package B overwrites
it ?

Or are you concerned about the case where there may be an identical
chunk in different files ? Are chunks "global" to the host ?

This problem could be addressed by copying all uncompressed chunks in a
staging area before installing the rpm, failing in a clean way (ie not
half way through a package install). The penalty is the need for enough
space to copy the uncompressed files though. more clever things could
be done with proper filesystem support and snapshotting and copy-on-
write, but not sure it is worth optimizing for what is normally a
relatively small scratch area (if you do it one package at a time
only).
Post by Adam Williamson
Post by Simo Sorce
A couple more questions.
I skimmed quickly at the format and I have two questions I did not
immediately see an answer for.
1) why are you still supporting SHA-1 in a new format ?
Zchunk cares about two types of checksums, the chunk checksums, used to
determine if two chunks are the same, and the full data checksum (which
currently defaults to SHA-25), used to actually validate the data.
Originally, SHA-1 was supposed to be used *only* for the chunk
checksums, but, somewhere along the way, it was pointed out that using
the first 128 bits of a SHA-512 hash would be faster and more secure,
so the default for the chunk checksums is now SHA-512/128.
The only reason SHA-1 support is still in zchunk is because I don't
want to break backwards compatibility for the (probably five) zchunk
files created before this change.
Having said that, zchunked rpms won't be able to depend on the full
data checksum (because the local chunks will be uncompressed), so we'd
need to use SHA-256 at minimum for the chunk checksums.
Post by Simo Sorce
2) what are the chunks sizes ?
The chunk sizes vary because you don't want inserting or removing a few
bytes to completely change all the following chunks. The current
default average size is 32KB, but that can be adjusted.
Is this a compromise between compression performance and granularity ?
Anything else went into the decision to settle around 32k ?
Some filesystems seem to gravitate around 64k extents so I am
wondering.
Post by Adam Williamson
Post by Simo Sorce
Sorry if this is already answered somewhere.
Finally what signature scheme where you planning to use ? And how do
you deal with the data you want to "exclude" from signing, omit it or
feed in blank "sectors" ?
I was planning to use GPG signatures, and was planning to just omit the
data I want excluded. Having said that, while the format supports
signatures, the code hasn't been written and if either of those answers
are bad/dangerous, we can change that.
We use GPG signatures right now, can't be any more dangerous than that
:-)

The omission vs blanking has no ill effect, but was not explicitly
mentioned, it should. Esp around places where the missing data is in
the middle of a "structure" in your diagrams, or it may be ambiguous
and lead to incompatible implementations if someone is ever going to
build another (and if zchunk is going to be adopted in rpm I bet there
will be some other implementation to do some crazy thing :-)
Post by Adam Williamson
Post by Simo Sorce
Thanks for any answer.
Simo.
Thank you for looking at this!
Jonathan
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/de
Jonathan Dieter
2018-11-19 21:52:01 UTC
Permalink
<snip>
Post by Simo Sorce
Post by Jonathan Dieter
Post by Simo Sorce
I do not think you can just trust random metadata somewhere, one of the
points of a rpm reinstall is to fix damaged files for example. It does
no good if you skip those because some file somewhere says they are
"OK". (If I understood your comment about "just downloading changed
chunks).
Yes, this is the crux of the problem. As I see it, dnf should verify
the checksums on the local files before downloading the missing chunks,
but that doesn't guarantee that the data won't be changed between the
download step and the install step. RPM would also need to verify the
checksums before starting the install phase, and would need to bail out
if the checksums had changed.
My biggest concern, though, is what happens if package A needs a
specific chunk in /usr/bin/foo and package B changes /usr/bin/foo while
being installed. The chunk was there when the install phase started,
but disappeared before package A was actually installed.
Is this different in a normal install ?
What if package A installs /usr/bin/foo and then package B overwrites
it ?
Or are you concerned about the case where there may be an identical
chunk in different files ? Are chunks "global" to the host ?
This. If we stored the checksums in the rpm database, then, yes they
would be global to the host.
Post by Simo Sorce
This problem could be addressed by copying all uncompressed chunks in a
staging area before installing the rpm, failing in a clean way (ie not
half way through a package install). The penalty is the need for enough
space to copy the uncompressed files though. more clever things could
be done with proper filesystem support and snapshotting and copy-on-
write, but not sure it is worth optimizing for what is normally a
relatively small scratch area (if you do it one package at a time
only).
What about just copying any uncompressed chunks required for the
current package or any packages still in the install queue? That might
reduce the scratch area even further.
Post by Simo Sorce
Post by Jonathan Dieter
Post by Simo Sorce
2) what are the chunks sizes ?
The chunk sizes vary because you don't want inserting or removing a few
bytes to completely change all the following chunks. The current
default average size is 32KB, but that can be adjusted.
Is this a compromise between compression performance and granularity ?
Anything else went into the decision to settle around 32k ?
Some filesystems seem to gravitate around 64k extents so I am
wondering.
Yes, this is just a compromise. The larger the chunk size, the larger
the delta you need to download, but the better the compression. We
could experiment with this to see if 64k would give us significantly
better compression.

I would also chunk on file borders in the rpm payload, so we don't end
up having a chunk span multiple files. That would get messy fast when
trying to rebuild from local files.
Post by Simo Sorce
Post by Jonathan Dieter
Post by Simo Sorce
Finally what signature scheme where you planning to use ? And how do
you deal with the data you want to "exclude" from signing, omit it or
feed in blank "sectors" ?
I was planning to use GPG signatures, and was planning to just omit the
data I want excluded. Having said that, while the format supports
signatures, the code hasn't been written and if either of those answers
are bad/dangerous, we can change that.
We use GPG signatures right now, can't be any more dangerous than that
:-)
The omission vs blanking has no ill effect, but was not explicitly
mentioned, it should. Esp around places where the missing data is in
the middle of a "structure" in your diagrams, or it may be ambiguous
and lead to incompatible implementations if someone is ever going to
build another (and if zchunk is going to be adopted in rpm I bet there
will be some other implementation to do some crazy thing :-)
Yep. Let me clarify that in the format definition (and add the new
checksum types, I noticed they're missing).

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject
Jonathan Dieter
2018-11-19 20:30:14 UTC
Permalink
Post by Stephen John Smoogen
Post by Neal Gompa
Post by Jonathan Dieter
Post by Kevin Kofler
Post by Jonathan Dieter
My proposal would be to make zchunk the rpm compression format for
Fedora.
1. zchunk is based on zstd, which is typically less efficient in terms of
compression ratio than xz, depending on settings
(see, e.g., https://github.com/inikep/lzbench), and
2. zchunk can by design only compress chunks individually and not benefit
from the space savings of a solid archive with a global dictionary,
I fear that this is going to significantly increase the size of the RPMs,
* for the initial downloads,
* for storage (e.g., keepcache=1, local mirrors, etc.), and
* for the people not using deltas for whatever reason.
I think zchunk makes a lot of sense for the metadata, but I am not convinced
that it is the right choice for the RPMs themselves.
I suspect the first is true, but zchunk does actually allow for a
global (per-file) dictionary that can be used to compress each chunk.
The difficulty is that the dictionary has to stay the same
between file
versions, or the chunk checksums won't match. There would have to be
some thought put into how to generate and store the dictionaries.
As for how much bigger a zchunked rpm will be compared to an xz rpm, at
the moment it's a bit hand-wavy. Based on zchunked repodata work I've
done, I think we might be looking at a size that's slightly smaller
than a gzipped rpm. I won't know for sure until I put together a
proof-of-concept, but I want to make sure that there aren't any gaping
holes in the proposal before I do that.
I did some work several months ago to evaluate zstd compression for
RPMs for Fedora, because of the lower memory and CPU usage for
(de)compression. However, the average size increase from xz was pretty
large (~20% or more on average, and nothing ever was either the same
or smaller), even with heavier compression settings. That might have
changed a bit with newer zstd releases that offer some more
tunables,
but I think it'll remain a tough sell on disk space.
CPU usage (in both uncompression install and deltarpm)
Memory usage per transaction
Network amount
Disk amount
I expect that the best we are going to get in any 'improvement' is
going to be 3 out of the 4. The xz compression and delta-rpm has a
cpu/memory tradeoff for disk and network in comparison to gzip but it
is mostly acceptable if you have fairly modern desktops. However for
older hardware or lower power systems that tradeoff may not be good.
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.

On the client:
The zchunk advantage over regular rpm is decreased network usage, while
its disadvantage is increased disk usage (since the local chunks will
be uncompressed).

The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.

Note that for most users the increased disk usage is temporary, since
rpms are deleted after install.

In our infrastructure:
The zchunk advantage over deltarpms is that they are created in the
build stage and shouldn't take any longer than building a normal rpm,
while deltarpms take quite a while to build.

The disadvantage is that our current rpms use xz compression which is
more efficient at compressing a whole file than zchunk is, so zchunked
rpms will require more disk space.

Hope that helps clarify things.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.or
Tom Hughes
2018-11-19 21:14:34 UTC
Permalink
Post by Jonathan Dieter
The zchunk advantage over regular rpm is decreased network usage, while
its disadvantage is increased disk usage (since the local chunks will
be uncompressed).
The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.
Note that for most users the increased disk usage is temporary, since
rpms are deleted after install.
If they're deleted after install then surely next time there is an
update there won't be any local chunks and everything will have to
be downloaded?

That's what has been confusing me about this whole thing - as I
understand it the idea is to only download new chunks and to reuse
chunks that are unchanged from earlier revisions, but it seems
that doing that would require keeping a local copy of every
installed rpm which would be a big change that nobody seems to
have mentioned.

Tom

--
Tom Hughes (***@compton.nu)
http://compton.nu/
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/li
Jonathan Dieter
2018-11-19 21:31:37 UTC
Permalink
Post by Tom Hughes
Post by Jonathan Dieter
The zchunk advantage over regular rpm is decreased network usage, while
its disadvantage is increased disk usage (since the local chunks will
be uncompressed).
The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.
Note that for most users the increased disk usage is temporary, since
rpms are deleted after install.
If they're deleted after install then surely next time there is an
update there won't be any local chunks and everything will have to
be downloaded?
That's what has been confusing me about this whole thing - as I
understand it the idea is to only download new chunks and to reuse
chunks that are unchanged from earlier revisions, but it seems
that doing that would require keeping a local copy of every
installed rpm which would be a big change that nobody seems to
have mentioned.
The idea is to use the locally installed files as the local chunks, the
same way that deltarpm does.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists
Nicolas Mailhot
2018-11-19 21:18:11 UTC
Permalink
Post by Jonathan Dieter
The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.
Unfortunately, that's a bad compromise for most limited clients. A
limited client can trade time for CPU or network performance, swap for
memory. What it can absolutely not make more of is storage, both install
and staging storage. Install storage requirements do not depend on rpm
tech level, but will generally go up over a system lifetime, adding
pressure on staging storage.

So you absolutely need to keep staging storage on par or less than
existing rpm/dnf if you do not want to obsolete classes of Fedora
hardware.

And Google released a huge quantity of cheap storage-deficient hardware
with its chromebooks. People do install Fedora on those.

--
Nicolas Mailhot
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraprojec
Jonathan Dieter
2018-11-19 21:35:32 UTC
Permalink
Post by Nicolas Mailhot
Post by Jonathan Dieter
The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.
Unfortunately, that's a bad compromise for most limited clients. A
limited client can trade time for CPU or network performance, swap for
memory. What it can absolutely not make more of is storage, both install
and staging storage. Install storage requirements do not depend on rpm
tech level, but will generally go up over a system lifetime, adding
pressure on staging storage.
So you absolutely need to keep staging storage on par or less than
existing rpm/dnf if you do not want to obsolete classes of Fedora
hardware.
And Google released a huge quantity of cheap storage-deficient hardware
with its chromebooks. People do install Fedora on those.
The only way we can keep staging storage down (and it would actually be
less than deltarpm/normal rpm) is to use the local chunks at install
time rather than download time. This comes with its own risks though,
see the other emails in this thread, specifically the ones following
Jan Pokorný's message.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorap
Simo Sorce
2018-11-19 21:37:03 UTC
Permalink
Post by Nicolas Mailhot
Post by Jonathan Dieter
The zchunk advantage over deltarpms is much less CPU usage, while its
disadvantages are slightly larger network usage and increased disk
usage.
Unfortunately, that's a bad compromise for most limited clients. A
limited client can trade time for CPU or network performance, swap for
memory. What it can absolutely not make more of is storage, both install
and staging storage. Install storage requirements do not depend on rpm
tech level, but will generally go up over a system lifetime, adding
pressure on staging storage.
So you absolutely need to keep staging storage on par or less than
existing rpm/dnf if you do not want to obsolete classes of Fedora
hardware.
And Google released a huge quantity of cheap storage-deficient hardware
with its chromebooks. People do install Fedora on those.
To be honest, I run low on normal HW as well, and often have to "make
space" for Version upgrades.

It would be much nicer if we could handle this problem by installing
smaller sets of rpms, and downloading the next set of rpms only when
the first got installed and deleted to make space.

But then there is the risk to end up with a frankensetup if you lose
network before the full upgrade is done...

On some systems I would even prefer to download as you go, as I have
more b/w than storage, but I do not want to experiment with putting
/var/cache on nfs ...

Simo.

--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject
Michael Schroeder
2018-11-20 12:45:10 UTC
Permalink
Post by Jonathan Dieter
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.
Btw, we can easily do that for deltarpms as well. We only recompress
because we want a rpm that is bit-identical to the remote one.

Having a '-u' option that makes applydeltarpm write a rpm with an
uncompressed payload and no payload signatures is just a couple of
lines of code.

Cheers,
Michael.

--
Michael Schroeder ***@suse.de
SUSE LINUX GmbH, GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject
Jonathan Dieter
2018-11-20 20:38:20 UTC
Permalink
Post by Michael Schroeder
Post by Jonathan Dieter
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.
Btw, we can easily do that for deltarpms as well. We only recompress
because we want a rpm that is bit-identical to the remote one.
Having a '-u' option that makes applydeltarpm write a rpm with an
uncompressed payload and no payload signatures is just a couple of
lines of code.
But the problem is that you would lose the signatures. To make this
work, we would need to create signatures of both the compressed and
uncompressed rpm (which wouldn't be a bad idea). Is there some way we
could (ab)use the current rpm format to make this work, or would it be
a backwards-incompatible change?

Jonathan

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fed
Florian Weimer
2018-11-23 13:16:04 UTC
Permalink
Post by Jonathan Dieter
Post by Michael Schroeder
Post by Jonathan Dieter
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.
Btw, we can easily do that for deltarpms as well. We only recompress
because we want a rpm that is bit-identical to the remote one.
Having a '-u' option that makes applydeltarpm write a rpm with an
uncompressed payload and no payload signatures is just a couple of
lines of code.
But the problem is that you would lose the signatures. To make this
work, we would need to create signatures of both the compressed and
uncompressed rpm (which wouldn't be a bad idea). Is there some way we
could (ab)use the current rpm format to make this work, or would it be
a backwards-incompatible change?
The problem is that the RPM header hash covers quite a few fields that
change if the payload compression changes. I'm not even sure if the
compressed payload itself is hashed.

IIRC, primary.xml only contains compressed payload hashes, not the
header hash, so if we cannot reproduce the compressed payload, then the
hashed chain from the centralized mirror manager to the individual RPM
packges is broken. This hash chain is very much needed for security
because RPM signing itself is quite broken.

Thanks,
Florian
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedorapro
Panu Matilainen
2018-11-23 13:35:30 UTC
Permalink
Post by Florian Weimer
Post by Jonathan Dieter
Post by Michael Schroeder
Post by Jonathan Dieter
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.
Btw, we can easily do that for deltarpms as well. We only recompress
because we want a rpm that is bit-identical to the remote one.
Having a '-u' option that makes applydeltarpm write a rpm with an
uncompressed payload and no payload signatures is just a couple of
lines of code.
But the problem is that you would lose the signatures. To make this
work, we would need to create signatures of both the compressed and
uncompressed rpm (which wouldn't be a bad idea). Is there some way we
could (ab)use the current rpm format to make this work, or would it be
a backwards-incompatible change?
The problem is that the RPM header hash covers quite a few fields that
change if the payload compression changes. I'm not even sure if the
compressed payload itself is hashed.
The old MD5 hash covers the header + compressed payload, and since >=
4.14 there's a separate SHA256 hash on the compressed payload alone.
Post by Florian Weimer
IIRC, primary.xml only contains compressed payload hashes, not the
header hash, so if we cannot reproduce the compressed payload, then the
hashed chain from the centralized mirror manager to the individual RPM
packges is broken. This hash chain is very much needed for security
because RPM signing itself is quite broken.
primary.xml contains hashes over the *entire* package, which is quite a
different thing - it only tells you the package you received is the same
as in the repo, but it says absolutely nothing about the validity of the
package as such.

rpm >= 4.14.2 enforces validation of the entire payload before starting
installing using the best hash available, and also has a true enforcing
signature mode available. There are various shortcomings in still of
course, but it's nowhere near as bad as it's traditionally been.

- Panu -
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists
Andrew Lutomirski
2018-11-23 16:28:19 UTC
Permalink
Post by Florian Weimer
Post by Jonathan Dieter
Post by Michael Schroeder
Post by Jonathan Dieter
Just to be clear on this, unlike deltarpm, zchunked rpms shouldn't
require extra CPU usage on the client side as they don't go through the
decompress-recompress cycle that deltarpms do. Re-assembling a zchunk
file requires no compression or decompression.
Btw, we can easily do that for deltarpms as well. We only recompress
because we want a rpm that is bit-identical to the remote one.
Having a '-u' option that makes applydeltarpm write a rpm with an
uncompressed payload and no payload signatures is just a couple of
lines of code.
But the problem is that you would lose the signatures. To make this
work, we would need to create signatures of both the compressed and
uncompressed rpm (which wouldn't be a bad idea). Is there some way we
could (ab)use the current rpm format to make this work, or would it be
a backwards-incompatible change?
The problem is that the RPM header hash covers quite a few fields that
change if the payload compression changes. I'm not even sure if the
compressed payload itself is hashed.
IIRC, primary.xml only contains compressed payload hashes, not the
header hash, so if we cannot reproduce the compressed payload, then the
hashed chain from the centralized mirror manager to the individual RPM
packges is broken. This hash chain is very much needed for security
because RPM signing itself is quite broken.
This does suggest a solution: don’t even bother checking RPM
signatures for RPMs that come from Fedora, at least in deltarpm mode.
Instead, fix the repodata signing and just check that. primary.xml
could gain a signature of the uncompressed rpm if that would make key
management easier.

As another way of looking at this, the model where one expects:

# rpm -e package.rpm

to validate a signature and therefore only install safe packages is
inherently broken, since it cannot protect against forced downgrades.
Instead, the model that works, or at least can work, is for:

# dnf install package

to validate whatever needs to be validated. So it seems entirely
reasonable to me that dnf could gain a new, better way to validate
package freshness and authenticity, and then dnf could pass
--nosignature or the equivalent to rpm when installing a package.
This way Fedora would keep compatibility with foreign rpms, but Fedora
could also avoid expensive recompression when using delta rpms.

--Andy
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproj
Leigh Scott
2018-11-18 07:02:24 UTC
Permalink
+1 to anything to rid me of deltarpms, I currently have to disable this lame default.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/de
Matthew Miller
2018-11-19 13:42:23 UTC
Permalink
Post by Leigh Scott
+1 to anything to rid me of deltarpms, I currently have to disable this lame default.
Leigh, it may be "lame" to you, but it's important to many people with
bandwidth limitations or slower connections. There's always room for
improvement but let's please not talk about other people's work in this way.
Thank you.

--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.org
Kevin Kofler
2018-11-22 04:55:55 UTC
Permalink
Post by Matthew Miller
Leigh, it may be "lame" to you, but it's important to many people with
bandwidth limitations or slower connections. There's always room for
improvement but let's please not talk about other people's work in this
way. Thank you.
The thing is, with a fast network and a slow CPU, deltarpms actually slow
you down. No wonder users in such a situation hate them.

Kevin Kofler
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@l
Roberto Ragusa
2018-11-22 11:09:06 UTC
Permalink
Post by Kevin Kofler
The thing is, with a fast network and a slow CPU, deltarpms actually slow
you down. No wonder users in such a situation hate them.
Actually, with a fast network and even a FAST CPU, deltarpms actually slow
you down.
There is no CPU where deltarpms is faster than the megabytes/s of a good network.

Regards.

--
Roberto Ragusa mail at robertoragusa.it
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/de
Jonathan Dieter
2018-11-19 19:49:18 UTC
Permalink
Post by Leigh Scott
+1 to anything to rid me of deltarpms, I currently have to disable this lame default.
The irony is that getting deltarpms into Fedora was largely my
fault. ;) Sorry, Leigh.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@l
Miroslav Suchý
2018-11-19 13:57:14 UTC
Permalink
Post by Jonathan Dieter
1. Downloading a new release of a zchunked rpm would be larger than
downloading the equivalent deltarpm. This is offset by the fact
that the client is able to work out which chunks it needs no matter
what the original rpm is, rather than needing a specific original
rpm as deltarpm does.
Does this means bigger requirements for copies of RPMs too? I mean for all our repos mirros, for RH Satellites,
Spacewalk, Koji, Copr, Retrace...

Miroslav
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/d
Jonathan Dieter
2018-11-19 19:46:49 UTC
Permalink
Post by Miroslav Suchý
Post by Jonathan Dieter
1. Downloading a new release of a zchunked rpm would be larger than
downloading the equivalent deltarpm. This is offset by the fact
that the client is able to work out which chunks it needs no matter
what the original rpm is, rather than needing a specific original
rpm as deltarpm does.
Does this means bigger requirements for copies of RPMs too? I mean
for all our repos mirros, for RH Satellites, Spacewalk, Koji, Copr,
Retrace...
Yes, and that should actually be item 3 in drawbacks. Zchunked rpms
will be larger than the current xz-compressed rpms, but the actual size
increase is still unknown. I think we can shoot for roughly the same
size as a gzip-compressed rpm, but I'm not sure.

Mirror storage *might* actually go down, since we no longer need to
store deltarpms, but anything that only stores rpms will definitely
require more space.

Jonathan

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/deve
Howard Johnson
2018-11-19 18:49:35 UTC
Permalink
Post by Jonathan Dieter
The uncompressed local chunks would be combined with the downloaded
compressed chunks to create a local rpm that will pass signature
verification without needing to recompress the uncompressed local
chunks, making this computationally much faster than rebuilding a
deltarpm, a win for users.
It would be really nice if, while changes are being made to RPM for
this, we could get rid of the local rpm build step and move support for
applying deltas into RPM itself. I know this is likely to be a lot more
involved than only changing the compression scheme, but I really think
it's worth it.

In fact, a quick poke through the rpm-maint list finds this open RPM RFE
https://github.com/rpm-software-management/rpm/issues/433 which shows
that the RPM dev team are open to the idea.

Currently, enabling deltarpms is a trade-off. If you have faster local
storage than download, deltas are a win. If your download is fast
enough, deltas add more overhead than they save. A system whereby
deltas can be applied with the same or less resources on the end host to
downloading the full RPM is a win for everyone. Combining that with a
system whereby the delta is actually just selective range downloads of
the payload of a "normal" binary RPM is the icing on the Fedora
Infrastructure cake ;)

--
HJ
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@
s***@gmx.com
2018-11-19 23:47:03 UTC
Permalink
Sent: Friday, November 16, 2018 at 10:07 PM
Subject: Proposal: Faster composes by eliminating deltarpms and using zchunked rpms instead
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
*deltarpm background*
As part of the compose process, deltarpms are generated between each
new rpm and both the GA version of the rpm and the previous version.
This process is very CPU and memory intensive, especially for large
rpms.
This also means that deltarpms are only useful for an end user if they
are either updating from GA or have been diligent about keeping their
system up-to-date. If a user is updating a package from N-2 to N,
there will be no deltarpm and the full rpm will be downloaded.
*zchunk background*
As some are aware, I've been working on zchunk[2], a compression format
that's designed for highly efficient deltas, and using it minimize
metadata downloads[3].
The core idea behind zchunk is that a file is split into independently
compressed chunks and the checksum of each compressed chunk is stored
in the zchunk header. When downloading a new version of the file, you
download the zchunk header first, check which chunks you already have,
and then download the rest.
*Proposal*
My proposal would be to make zchunk the rpm compression format for
Fedora. This would involve a few additions to the zchunk format[4]
(something the format has been designed to accommodate), and would
require some changes to the rpm file format.
*Benefit*
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.
The uncompressed local chunks would be combined with the downloaded
compressed chunks to create a local rpm that will pass signature
verification without needing to recompress the uncompressed local
chunks, making this computationally much faster than rebuilding a
deltarpm, a win for users.
The savings wouldn't be as good as what deltarpm can achieve, but
deltarpms would be redundant and could be removed, completely
eliminating a large step from the compose process.
*Drawbacks*
1. Downloading a new release of a zchunked rpm would be larger than
downloading the equivalent deltarpm. This is offset by the fact
that the client is able to work out which chunks it needs no matter
what the original rpm is, rather than needing a specific original
rpm as deltarpm does.
2. The rebuilt rpm may not be byte-for-byte identical to the original,
but will be able to be validated without decompression, as explained
in the next section
*Changes*
The zchunk format would need to be extended to allow for a zchunked rpm
to contain both the uncompressed chunks that were already on the local
system and the newly downloaded compressed chunks while still passing
signature verification. This would also require moving signature
verification to zchunk.
The rpm file format has to be changed because the zchunk header needs
to be at the beginning of the file in order for the zchunk library
figure out which chunks it needs to download. My suggestions for
1. Signing should be moved to the zchunk format as described at the
beginning of this section
2. The rpm header should be stored in one stream inside the zchunk
file. This allows it to be easily extracted separately from the
data
3. The rpm cpio should be stored in a second stream inside the zchunk
file.
4. At minimum, an optional zchunk element should be set to identify
zchunk rpms as rpms rather than regular zchunk files. If desired,
optional elements could also be set containing %{name}, %[version},
%{release}, %{arch} and %{epoch}. This would allow this information
to be read easily without needing to extract the rpm header stream.
*Final notes*
I realize this is a massive proposal, zchunk is still very young, and
we're still working on getting the dnf zchunk pull requests reviewed.
I do think it's feasible and provides an opportunity to eliminate a
pain point from our compose process while still reducing the download
size for our users.
https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes
[2]: https://github.com/zchunk/zchunk
[3]: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata
[4]: https://github.com/zchunk/zchunk/blob/master/zchunk_format.txt
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
Based on work I've done in this area, I'm somewhat sceptical that this
will work out well. I spent a decent amount of time comparing various
approaches to data saving for both compressed data and regular binaries.
This was done a few months ago, so a few of the algos may have changed a
little since then, but I doubt the changes will be huge.

The various things I tested were, in no particular order:
* xdelta3 (and a few similar VCDIFF based algos)
* zstd
* zchunk
* bsdiff
* courgette
* zucchini
And a few others which aren't interesting enough to mention here.

xdelta3 is old now, but was the standard binary delta compression tool
for a long time. Of the better performing algos it has reasonably good
generation time, and works reasonably well on compressed data (still a
good way behind the top of the pile), but falls far behind on binaries.

zstd was generally disappointing for both compressed and binary data. It's
really a compression format, not a delta generator, and while it performed
very well compared to traditional compression formats, it was unable to
compete with other tools here.

zchunk had basically the same problems as zstd, except that the chunking
overhead made files even larger, and small symbol changes could mean that
a very large number of chunks needed to be transmitted.

bsdiff is a larve improvement over xdelta3 in terms of final size for
binary data. It is considerably more expensive in terms of memory and cpu
to compute, but is fast to apply[0]. It works much less well on compressed
data, and really needs to work on a binary to be at its best.

courgette is excellent at reducing binary size, and still very good at
compressed data. By data size alone it's the best of all of these, but is
somewhat complex and expensive, particularly in terms of memory. An
over view of how it works is [1]

zucchini is an experimental delta generation tool for chromium. I can't
see much that's been published externally about it, but I found it to
compress almost as well as courgette but with greatly reduced memory use.
Code is very much a moving target, but is located at [2].




Overall in my tests zstd/schunk gave massively worse compression compared
to modern delta algorithms (courgette, zucchini), and the total time to
download, apply, install was always slowest for zstd based approaches.

So, then, the only time this would work is if the changed-chunk detection
feature of zchunk actually works effectively for RPMs. Unfortunately,
when binaries change (and when RPMs as a whole change) it's not unusual
for this to manifest as adding/removing significant chunks of data from
near the beginning of the file, which means that the chunk matching fails
and you end up with a huge and slow download.

If you care only about making the 50th %ile case better, then zchunk is
probably at least not any worse than what we have now. However this
will, I believe, come at the cost of making the 95th %ile case pretty
unpleasant for end users.

I think it'd be far better to explore Howard's proposal, for per-file
delta generation (as opposed to per-RPM), and use modern delta
generation (probably courgette) to compute the delta.



0: http://www.daemonology.net/bsdiff/
1: http://dev.chromium.org/developers/design-documents/software-updates-courgette
2: https://github.com/chromium/chromium/tree/master/components/zucchini
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/dev
Jonathan Dieter
2018-11-20 20:30:11 UTC
Permalink
Hey, thanks for the detailed analysis. Comments inline.
Post by s***@gmx.com
Based on work I've done in this area, I'm somewhat sceptical that this
will work out well. I spent a decent amount of time comparing various
approaches to data saving for both compressed data and regular binaries.
This was done a few months ago, so a few of the algos may have changed a
little since then, but I doubt the changes will be huge.
Do you have some examples of the data you tested?
Post by s***@gmx.com
* xdelta3 (and a few similar VCDIFF based algos)
* zstd
* zchunk
* bsdiff
* courgette
* zucchini
And a few others which aren't interesting enough to mention here.
xdelta3 is old now, but was the standard binary delta compression tool
for a long time. Of the better performing algos it has reasonably good
generation time, and works reasonably well on compressed data (still a
good way behind the top of the pile), but falls far behind on binaries.
zstd was generally disappointing for both compressed and binary data. It's
really a compression format, not a delta generator, and while it performed
very well compared to traditional compression formats, it was unable to
compete with other tools here.
zchunk had basically the same problems as zstd, except that the chunking
overhead made files even larger, and small symbol changes could mean that
a very large number of chunks needed to be transmitted.
bsdiff is a larve improvement over xdelta3 in terms of final size for
binary data. It is considerably more expensive in terms of memory and cpu
to compute, but is fast to apply[0]. It works much less well on compressed
data, and really needs to work on a binary to be at its best.
courgette is excellent at reducing binary size, and still very good at
compressed data. By data size alone it's the best of all of these, but is
somewhat complex and expensive, particularly in terms of memory. An
over view of how it works is [1]
zucchini is an experimental delta generation tool for chromium. I can't
see much that's been published externally about it, but I found it to
compress almost as well as courgette but with greatly reduced memory use.
Code is very much a moving target, but is located at [2].
These seem to be a bit apples-to-oranges. xdelta3, bsdiff, courgette
and zucchini are designed to generate deltas between two specific
files. zstd is just a compression format, while zchunk is designed to
download the smallest difference between two arbitrary files.

You are absolutely right about the cost of small symbol changes, one of
the reasons that courgette does some amount of disassembly and that
bsdiff uses what it calls an "add block". Because of zchunk's design,
there's no way it can benefit unless we implement some kind of
disassembly, and I'm leery of going down that road.
Post by s***@gmx.com
Overall in my tests zstd/schunk gave massively worse compression compared
to modern delta algorithms (courgette, zucchini), and the total time to
download, apply, install was always slowest for zstd based approaches.
I'd love to see your test methodology. I'm not surprised about the
difference in delta size (which really isn't the same as compression),
but I am a bit surprised by the speed differences you saw, assuming
your data was compressed.

If your data wasn't, do remember that rpms are compressed, so if you're
creating a new delta method, it will have to decompress both the old
and new rpms before generating the delta (as deltarpm does).

zchunk is able to get around this extra step because it is the
compression format.
Post by s***@gmx.com
So, then, the only time this would work is if the changed-chunk detection
feature of zchunk actually works effectively for RPMs. Unfortunately,
when binaries change (and when RPMs as a whole change) it's not unusual
for this to manifest as adding/removing significant chunks of data from
near the beginning of the file, which means that the chunk matching fails
and you end up with a huge and slow download.
Thats... not how the chunk matching works. The chunking function uses
the buzhash algorithm to try to break on consistent patterns, no matter
where data is added or removed. If you remove x bytes from the
beginning of the file and then add y bytes, at most one or two chunks
should be affected.
Post by s***@gmx.com
If you care only about making the 50th %ile case better, then zchunk is
probably at least not any worse than what we have now. However this
will, I believe, come at the cost of making the 95th %ile case pretty
unpleasant for end users.
I think it'd be far better to explore Howard's proposal, for per-file
delta generation (as opposed to per-RPM), and use modern delta
generation (probably courgette) to compute the delta.
Maybe I misunderstood Howard's proposal, but I understood him to be
suggesting that rebuilding a full rpm from delta data is wasteful when
we could just install the delta data like a regular rpm. Its an
interesting idea that I've commented on elsewhere in this thread.

The problem with generating per-file deltas is the same as per-rpm
deltas. Infrastructure has to be used to generate the deltas, and the
whole point of this proposal is to eliminate that step.

Having said that, it may be that we can generate deltas more
efficiently, which would make the cost of generating them easier to
bear. I would be interested in seeing whether courgette/zucchini based
deltarpms would be significantly faster to generate than the current
bsdiff based deltarpms.

Thanks again for your response,

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraprojec
Kamil Paral
2018-11-21 13:36:00 UTC
Permalink
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
Hey Jonathan,

thanks for working on this. The proposed changes sound good to me. I'm a
bit worried that zchunk is not yet a proven format, so it might be a good
idea to use it for metadata first, see whether it works as expected, and
then push it for RPM files. But that's for more technical people to judge.

I have some concrete questions, though:
1. I have noticed that especially with large RPMs (firefox, chrome, atom,
game data like 0ad-data, etc), my PCs are mostly bottlenecked by CPU when
installing them. And that's with a modern 3.5+GHz CPU. That's because RPM
decompression runs in a single thread only, and xz is just unbelievably
slow. I wonder, would zchunk used as an RPM compression algorithm improve
this substantially? Can it decompress in multiple threads and/or does it
have much faster decompression speeds (and how much)? I don't care about
RPM size increase, but I'd really like to have them installed fast. (That's
of course just my personal preference, but this also affects the speed of
mock builds and such, so I think it's relevant.)
2. In our past QA efforts in Fedora, we had use cases for retrieving rpm
header data without retrieving the actual content (the payload). That was
for cases when we needed to check e.g. dependency issues, but the rpms were
not placed in a repository yet (i.e. no easy access to their metadata) and
it was slow and wasteful to download the whole rpm just to get the header.
Will the new zchunk compression still make it possible to retrieve just the
header without accessing all the payload data? (It would be great to make
this accessible from Python and not just C, but that's a plea I should
direct to rpm maintainers, I guess).

Thanks,
Kamil
Zdenek Kabelac
2018-11-21 14:01:42 UTC
Permalink
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time.  This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
Hey Jonathan,
thanks for working on this. The proposed changes sound good to me. I'm a bit
worried that zchunk is not yet a proven format, so it might be a good idea to
use it for metadata first, see whether it works as expected, and then push it
for RPM files. But that's for more technical people to judge.
1. I have noticed that especially with large RPMs (firefox, chrome, atom, game
data like 0ad-data, etc), my PCs are mostly bottlenecked by CPU when
installing them. And that's with a modern 3.5+GHz CPU. That's because RPM
decompression runs in a single thread only, and xz is just unbelievably slow.
I wonder, would zchunk used as an RPM compression algorithm improve this
substantially? Can it decompress in multiple threads and/or does it have much
faster decompression speeds (and how much)? I don't care about RPM size
increase, but I'd really like to have them installed fast. (That's of course
just my personal preference, but this also affects the speed of mock builds
and such, so I think it's relevant.)
Well I'm ATM way more concerned about absurd size of rpm REPO metadata size.

Often my upgrade first download 200MB of metadata to update 20MB of actual RPMs.

Please anyone - fix this first before anyone starts to parallelise
decompression - this is minimal problem compared with the amount of processed
metadata....

Next topic is - to replace/rewrite ONLY new files - files that has NOT changed
might not be written at all (write of files takes way more time then its
decompression) - in fact such file might not be even decompressed (depends on
compression layout).

Thanks

Zdenek
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/de
Jonathan Dieter
2018-11-21 20:46:54 UTC
Permalink
Post by Kamil Paral
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
Hey Jonathan,
thanks for working on this. The proposed changes sound good to me.
I'm a bit worried that zchunk is not yet a proven format, so it might
be a good idea to use it for metadata first, see whether it works as
expected, and then push it for RPM files. But that's for more
technical people to judge.
1. I have noticed that especially with large RPMs (firefox, chrome,
atom, game data like 0ad-data, etc), my PCs are mostly bottlenecked
by CPU when installing them. And that's with a modern 3.5+GHz CPU.
That's because RPM decompression runs in a single thread only, and xz
is just unbelievably slow. I wonder, would zchunk used as an RPM
compression algorithm improve this substantially? Can it decompress
in multiple threads and/or does it have much faster decompression
speeds (and how much)? I don't care about RPM size increase, but I'd
really like to have them installed fast. (That's of course just my
personal preference, but this also affects the speed of mock builds
and such, so I think it's relevant.)
The zstd compression that zchunk uses internally is designed to be
faster than even gzip at decompression. Currently zchunk is single-
threaded, but, given that each chunk is independent, making it multi-
threaded should be pretty trivial, and is on the todo list.
Post by Kamil Paral
2. In our past QA efforts in Fedora, we had use cases for retrieving
rpm header data without retrieving the actual content (the payload).
That was for cases when we needed to check e.g. dependency issues,
but the rpms were not placed in a repository yet (i.e. no easy access
to their metadata) and it was slow and wasteful to download the whole
rpm just to get the header. Will the new zchunk compression still
make it possible to retrieve just the header without accessing all
the payload data? (It would be great to make this accessible from
Python and not just C, but that's a plea I should direct to rpm
maintainers, I guess).
The zchunk format supports the concept of multiple independent streams
in a single file. A zchunk rpm would contain two streams, the rpm
header and the rpm payload. Since downloading a zchunk file is two
steps already (downloading the zchunk header, and then downloading the
required chunks), it should be easy enough to download only the chunks
needed for the rpm header stream.

As for a python API, I would love for zchunk to have that too, but
haven't had the time yet.

I hope that helps.

Jonathan
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@
Mohan Boddu
2018-11-21 17:06:51 UTC
Permalink
+1 to the reducing compose time, although there are many other things that
we have to do to make it even faster.

There are many good points that people are bringing up here, but I hope it
wont have any bottlenecks as that of xz compression.

Also since its a very young technology we should spend some good quality of
time in testing.

Thanks for working on this.
Post by Jonathan Dieter
For reference, this is in reply to Paul's email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
*deltarpm background*
As part of the compose process, deltarpms are generated between each
new rpm and both the GA version of the rpm and the previous version.
This process is very CPU and memory intensive, especially for large
rpms.
This also means that deltarpms are only useful for an end user if they
are either updating from GA or have been diligent about keeping their
system up-to-date. If a user is updating a package from N-2 to N,
there will be no deltarpm and the full rpm will be downloaded.
*zchunk background*
As some are aware, I've been working on zchunk[2], a compression format
that's designed for highly efficient deltas, and using it minimize
metadata downloads[3].
The core idea behind zchunk is that a file is split into independently
compressed chunks and the checksum of each compressed chunk is stored
in the zchunk header. When downloading a new version of the file, you
download the zchunk header first, check which chunks you already have,
and then download the rest.
*Proposal*
My proposal would be to make zchunk the rpm compression format for
Fedora. This would involve a few additions to the zchunk format[4]
(something the format has been designed to accommodate), and would
require some changes to the rpm file format.
*Benefit*
The benefit of zchunked rpms is that, when downloading an updated rpm,
you would only need to download the chunks that have changed from
what's on your system.
The uncompressed local chunks would be combined with the downloaded
compressed chunks to create a local rpm that will pass signature
verification without needing to recompress the uncompressed local
chunks, making this computationally much faster than rebuilding a
deltarpm, a win for users.
The savings wouldn't be as good as what deltarpm can achieve, but
deltarpms would be redundant and could be removed, completely
eliminating a large step from the compose process.
*Drawbacks*
1. Downloading a new release of a zchunked rpm would be larger than
downloading the equivalent deltarpm. This is offset by the fact
that the client is able to work out which chunks it needs no matter
what the original rpm is, rather than needing a specific original
rpm as deltarpm does.
2. The rebuilt rpm may not be byte-for-byte identical to the original,
but will be able to be validated without decompression, as explained
in the next section
*Changes*
The zchunk format would need to be extended to allow for a zchunked rpm
to contain both the uncompressed chunks that were already on the local
system and the newly downloaded compressed chunks while still passing
signature verification. This would also require moving signature
verification to zchunk.
The rpm file format has to be changed because the zchunk header needs
to be at the beginning of the file in order for the zchunk library
figure out which chunks it needs to download. My suggestions for
1. Signing should be moved to the zchunk format as described at the
beginning of this section
2. The rpm header should be stored in one stream inside the zchunk
file. This allows it to be easily extracted separately from the
data
3. The rpm cpio should be stored in a second stream inside the zchunk
file.
4. At minimum, an optional zchunk element should be set to identify
zchunk rpms as rpms rather than regular zchunk files. If desired,
optional elements could also be set containing %{name}, %[version},
%{release}, %{arch} and %{epoch}. This would allow this information
to be read easily without needing to extract the rpm header stream.
*Final notes*
I realize this is a massive proposal, zchunk is still very young, and
we're still working on getting the dnf zchunk pull requests reviewed.
I do think it's feasible and provides an opportunity to eliminate a
pain point from our compose process while still reducing the download
size for our users.
https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes
[2]: https://github.com/zchunk/zchunk
[3]: https://fedoraproject.org/wiki/Changes/Zchunk_Metadata
[4]: https://github.com/zchunk/zchunk/blob/master/zchunk_format.txt
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
John Reiser
2018-11-22 15:36:48 UTC
Permalink
For reference, this is in reply to Paul [Frield]'s email about lifecycle
objectives, specifically focusing on problem statement #1[1].
<tl;dr>
Have rpm use zchunk as its compression format, removing the need for
deltarpms, and thus reducing compose time. This will require changes
to both the rpm format and new features in the zchunk format.
</tl;dr>
https://fedoraproject.org/wiki/Objectives/Lifecycle/Problem_statements#Challenge_.231:_Faster.2C_more_scalable_composes
Currently a compose takes a minimum of around 8.5 hours ([1] and others);
the goal is 1 hour. The goal is particularly relevant during the last
phase of a Fedora release cycle (after code freeze) when each successive
compose contains only a few .rpms that have changed from the previous
compose, and the question-of-the-hour is whether some particular bug
actually was fixed. In this case deltarpms can be ignored.
The goal also is relevant to a future of CI (Continuous Integration)
that has automated gating of changes depending on successful tests
of the entire compose ("Does it boot and pass the test cases?")
Again, deltarpms can be ignored.

Please display some measurements which support the belief
that using zchunk will reduce compose time dramatically,
whether by eliminating deltarpms or by other effects.

Did you view

"Flock 2018 - Improving Fedora Compose process" (Aug.8, 2018; 55min)
They do present measurements [coarse]. The overwhelming
conclusion is that 8.5 hours is a data flow problem, both
large-grain (moving .rpms and other files across the network)
and small-grain (extracting the desired information from
the header of an .rpm that uses data compression.)

The number one request that I heard in the recorded session
was for faster access to fields in the header of an .rpm
that uses data compression. This is slow today because the
header+tail are compressed together as if a single logical stream,
and the code retrieves and de-compresses the whole .rpm in order to access
just the header. However, both xz (liblzma) and gzip (zlib) accept
a parameter to stop decompressing after generating N bytes of output;
why not use this? N can be known, or over-estimated, or iteratively
(and incrementally) approximated until it covers the entire header.
To make de-compression of the rpm header even easier,
call xz_compress twice: once with the header, once with the tail.
The concatenation of the compressed outputs is transparent
by default but visible if you look for it, just like zlib.

In effect the "directory" feature of zchunk can be implemented
for the special case of header-vs-tail (using either xz (liblzma)
or gzip (zlib)) without disturbing other clients of .rpms.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproj
Jeff Johnson
2018-11-24 04:11:24 UTC
Permalink
Um, rpm headers are not compressed, so there is no decompression overhead retrieving a tag value, and there is no reason why an entire rpm needs to be decompressed in order to retrieve a value.

Presumably you are obliquely referring to deltarpm creation, not rpm tag value access.

After watching the video, I am also unsure how you got the impression that the number one request was for faster access to fields in an rpm header.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/
Jan Pokorný
2018-12-03 21:16:27 UTC
Permalink
Post by Jonathan Dieter
The core idea behind zchunk is that a file is split into independently
compressed chunks and the checksum of each compressed chunk is stored
in the zchunk header. When downloading a new version of the file, you
download the zchunk header first, check which chunks you already have,
and then download the rest.
Just one more thought I reliazed in hindsight, there are ways to cut down
the installed files in RPM ecosystem, currently with a request to omit
documentation (%doc tagged files, see --nodocs/--excludedocs).

Indeed, that's a sort of files you can usually omit without hesitation
in containers/VMs. Perhaps there are some more bits that are de facto
optional without losing anything from the functionality.

So with clever separation, such bits wouldn't even need to be downloaded
when they will not eventually make it to the disk. That might make
things like customizing a base container image tiny bit more swift,
e.g. in CI/CD context without many connectivity guarantees (up to
mirrors anyway). But might not be worth it if the trade-off
is already predictably suboptimal in other aspects.
--
Nazdar,
Jan (Poki)
Jonathan Dieter
2018-12-05 19:09:53 UTC
Permalink
Post by Jan Pokorný
Post by Jonathan Dieter
The core idea behind zchunk is that a file is split into independently
compressed chunks and the checksum of each compressed chunk is stored
in the zchunk header. When downloading a new version of the file, you
download the zchunk header first, check which chunks you already have,
and then download the rest.
Just one more thought I reliazed in hindsight, there are ways to cut down
the installed files in RPM ecosystem, currently with a request to omit
documentation (%doc tagged files, see --nodocs/--excludedocs).
Indeed, that's a sort of files you can usually omit without hesitation
in containers/VMs. Perhaps there are some more bits that are de facto
optional without losing anything from the functionality.
So with clever separation, such bits wouldn't even need to be downloaded
when they will not eventually make it to the disk. That might make
things like customizing a base container image tiny bit more swift,
e.g. in CI/CD context without many connectivity guarantees (up to
mirrors anyway). But might not be worth it if the trade-off
is already predictably suboptimal in other aspects.
I think this is very interesting and definitely feasible (assuming the
core concept of zchunked rpms is reasonable).

On a tangent, I realized something else about metadata generation (and
I think someone else had picked up on it, but it hadn't quite
registered with me). Currently generating metadata for RPMs involves
reading the full RPM to calculate the checksum. With zchunk, the
header checksum is stored at the beginning of the file and is all you
need to verify a zchunk file.

Jonathan
Jeff Johnson
2018-12-05 19:24:20 UTC
Permalink
An entire *.rpm file needs to be read only because rpm-metadata chose the file digest to be included in metadata.

There is no way for a plaintext file to contain its own digest, true for zchunk as well as current rpm format.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedora

Loading...