Discussion:
F23 System Wide Change: Default Local DNS Resolver
Jan Kurik
2015-06-01 12:03:27 UTC
Permalink
= Proposed System Wide Change: Default Local DNS Resolver =
https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver


Change owner(s): P J P <pjp at fedoraproject.org>, Pavlix <pavlix at pavlix.net>, Tomas Hozza <thozza at redhat.com>, Petr Špaček <pspacek at redhat.com>


Install a local DNS resolver trusted for the DNSSEC validation running on 127.0.0.1:53. This must be the only name server entry in /etc/resolv.conf.
The automatic name server entries received via dhcp/vpn/wireless configurations should be stored separately (e.g. this is stored in the NetworkManager internal state), as transitory name servers to be used by the trusted local resolver. In all cases, DNSSEC validation will be done locally.


== Detailed Description ==
There are growing instances of discussions and debates about the need for a trusted DNSSEC validating local resolver running on 127.0.0.1:53. There are multiple reasons for having such a resolver, importantly security & usability. Security & protection of user's privacy becomes paramount with the backdrop of the increasingly snooping governments and service providers world wide.

People use Fedora on portable/mobile devices which are connected to diverse networks as and when required. The automatic DNS configurations provided by these networks are never trustworthy for DNSSEC validation. As currently there is no way to establish such trust.

Apart from trust, these name servers are often known to be flaky and unreliable. Which only adds to the overall bad and at times even frustrating user experience. In such a situation, having a trusted local DNS resolver not only makes sense but is in fact badly needed. It has become a need of the hour. (See: [1], [2], [3])

Going forward, as DNSSEC and IPv6 networks become more and more ubiquitous, having a trusted local DNS resolver will not only be imperative but be unavoidable. Because it will perform the most important operation of establishing trust between two parties.

All DNS literature strongly recommends it. And amongst all discussions and debates about issues involved in establishing such trust, it is unanimously agreed upon and accepted that having a trusted local DNS resolver is the best solution possible. It'll simplify and facilitate lot of other design decisions and application development in future. (See: [1], [2], [3])

[1] https://www.ietf.org/mail-archive/web/dane/current/msg06469.html
[2] https://www.ietf.org/mail-archive/web/dane/current/msg06658.html
[3] https://lists.fedoraproject.org/pipermail/devel/2014-April/197755.html


== Scope ==
* Proposal owners: Proposal owners shall have to
** define the syntax and semantics for new configuration parameters/files.
** properly document how to test and configure the new default setup
** persuade and coordinate with the other package owners to incorporate new changes/workflow in their applications.
** discuss with WGs in which products the change makes sense and what are the expectations of WGs for different Fedora products
** resolve interoperability issues for Docker and other containers use-cases

* Other developers: (especially NetworkManager and the likes)
** No new features/workflow should be needed from other applications, since the use of trusted local DNS resolver should be seamless.
** Ideally other developers and user should test their software and application in this setup and verify that it is working as expected

* Release engineering:
** would have to ensure that trusted local DNS resolver is available throughout the installation stage and the same is installed on all installations including LiveCDs etc.
** Add services needed for the setup into the default presets (dnssec-trigger and Unbound)

* Policies and guidelines:
** the chosen trusted DNS resolver package (Unbound) would have to ensure that their DNS resolver starts at boot time and works out of the box without any user intervention.
** NetworkManager and others would have to be told to not tamper with the local nameserver entries in '/etc/resolv.conf' and save the dynamic nameserver entries in a separate configuration file.
--
Jan Kuřík
_______________________________________________
devel-announce mailing list
devel-***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel-announce
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Condu
Matthew Miller
2015-06-01 13:32:47 UTC
Permalink
Post by Jan Kurik
People use Fedora on portable/mobile devices which are connected to
diverse networks as and when required. The automatic DNS
configurations provided by these networks are never trustworthy for
DNSSEC validation. As currently there is no way to establish such
trust.
Is this proposal meant to apply to Cloud and Server as well? With
Cloud, it's at least conventional to assume that the network
infrastructure provided by the provider is trustworthy (see
cloud-init). And Server presumably will not be running on
portable/mobile devices connecting to arbitrary networks. For Server,
there may be other advantages, but do we also want these for Cloud?

I'm also concerned about going forward with this without having a solid
answer to the container problem.
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct:
Tomas Hozza
2015-06-01 14:37:11 UTC
Permalink
Post by Matthew Miller
Post by Jan Kurik
People use Fedora on portable/mobile devices which are connected to
diverse networks as and when required. The automatic DNS
configurations provided by these networks are never trustworthy for
DNSSEC validation. As currently there is no way to establish such
trust.
Is this proposal meant to apply to Cloud and Server as well? With
Cloud, it's at least conventional to assume that the network
infrastructure provided by the provider is trustworthy (see
cloud-init). And Server presumably will not be running on
portable/mobile devices connecting to arbitrary networks. For Server,
there may be other advantages, but do we also want these for Cloud?
As you can read in the Change proposal, this is part of the scope:
"discuss with WGs in which products the change makes sense and
what are the expectations of WGs for different Fedora products"

Yes, we think the change makes sense for Server. It is still
beneficial from the security point of view to do the DNSSEC
validation on Server. Even though the configuration on Server
will be static, dnssec-trigger + unbound can be used for this.
Otherwise it would require manual configuration from the
administrator, to enable DNSSEC validation.

As for the Cloud, we are not sure. Maybe it makes sense on
the Atomic Host, but we want to discuss this with people
involved in the Cloud product(s).
Post by Matthew Miller
I'm also concerned about going forward with this without having a solid
answer to the container problem.
This is also one of the scopes:
"resolve interoperability issues for Docker and other containers use-cases"

PJP is looking at this.

This is work in progress. We will not enable the change in products
and environments in which it will turn out that it does not make sense.

Tomas
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of C
Paul Wouters
2015-06-01 14:54:50 UTC
Permalink
Post by Tomas Hozza
Yes, we think the change makes sense for Server. It is still
beneficial from the security point of view to do the DNSSEC
validation on Server.
Agreed.
Post by Tomas Hozza
Even though the configuration on Server
will be static, dnssec-trigger + unbound can be used for this.
Otherwise it would require manual configuration from the
administrator, to enable DNSSEC validation.
That really depends on the network. If there are no split-view DNS,
and no DNS firewall rules, just running unbound and resolv.conf
pointing to localhost as default would work.

But it would be better to use the LAN's DNS server. It will be
faster due to the upstream cache, and it will avoid any DNS ports
being firewalled.

This is either something the admin configures (eg during install
or anaconda, or in ifcfg-* files that unbound could use for an
unbound-control forward_add) or something that comes in via DHCP.
If via DHCP, unbound could have a hook into that (or NM if used)
without dnssec-trigger.

I see dnssec-trigger only as a tool for roaming devices such as
phones and laptops, that have a GUI and an enduser. On servers,
the various states of dnssec-trigger makes no sense - especially
with the lack of the hotspot problem on servers.
Post by Tomas Hozza
As for the Cloud, we are not sure. Maybe it makes sense on
the Atomic Host, but we want to discuss this with people
involved in the Cloud product(s).
For lean single application containers or small cloud instances, the
focus is probably on not duplicating unbound 1000x times on the same
hardware. So yes, those considerations are quite different, and still
a hot topic. Does one trust a cloudy DNSSEC server, or do validation
within the instance/container. If inside, per-app or per-container?
Those would most likely not want to run an unbound daemon, but rely
on something like getdns or edns-query-chain and just the DNSSEC
root key to do their validation.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-condu
Matthew Miller
2015-06-01 14:56:32 UTC
Permalink
Post by Tomas Hozza
This is work in progress. We will not enable the change in products
and environments in which it will turn out that it does not make sense.
Cool; just wanted to be sure that that was being thought about
seriously, and it sounds like it is.
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http:/
Ryan S. Brown
2015-06-01 15:04:05 UTC
Permalink
Post by Tomas Hozza
Post by Matthew Miller
Post by Jan Kurik
People use Fedora on portable/mobile devices which are connected to
diverse networks as and when required. The automatic DNS
configurations provided by these networks are never trustworthy for
DNSSEC validation. As currently there is no way to establish such
trust.
Is this proposal meant to apply to Cloud and Server as well? With
Cloud, it's at least conventional to assume that the network
infrastructure provided by the provider is trustworthy (see
cloud-init). And Server presumably will not be running on
portable/mobile devices connecting to arbitrary networks. For Server,
there may be other advantages, but do we also want these for Cloud?
"discuss with WGs in which products the change makes sense and
what are the expectations of WGs for different Fedora products"
Yes, we think the change makes sense for Server. It is still
beneficial from the security point of view to do the DNSSEC
validation on Server. Even though the configuration on Server
will be static, dnssec-trigger + unbound can be used for this.
Otherwise it would require manual configuration from the
administrator, to enable DNSSEC validation.
I disagree; for server & cloud deployments it doesn't make sense to
duplicate a DNS server on *every* host, and if you care about DNSSEC you
likely already run a trusted resolver.

The trust and management models for Server are fundamentally different
from those of Workstation, since servers don't usually get tossed in a
backpack and put on potentially-hostile coffee shop wi-fi. They also
generally try to run fewer services than a workstation. The datacenter
network is generally trusted, and a shared DNSSEC resolver makes way
more sense.

It may be "beneficial" from a security PoV to have DNSSEC resolution,
but it isn't beneficial to have to patch 1 million copies of unbound if
a vuln is discovered instead of just a few shared resolvers for the
whole DC.
Post by Tomas Hozza
...[snip]...
--
Ryan Brown / Software Engineer, Openstack / Red Hat, Inc.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedorapr
Jason L Tibbitts III
2015-06-01 17:55:08 UTC
Permalink
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.

I disagree generally in the case of server deployments.

Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.

Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.

Of course, most folks who have enough infrastructure to have their own
DNS servers and such can easily figure out how to configure a local
resolver if need be, so what's in the default setup really makes no
difference. And for the home user who might want to grab the server
spin/product/whatever-we're-calling-it-this-week, well, I'd think they'd
want the local resolver.

What really concerns me is what happens with split DNS. I assume I'll
just need to configure the local resolvers to talk only to my resolvers,
but this would really need to be documented.

- J<
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: ht
Reindl Harald
2015-06-01 18:02:44 UTC
Permalink
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
no it is not in case of a serious server setup - period
Post by Jason L Tibbitts III
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
if your *LAN* nameservers are going offline you need to solve that
problem and ask you why....
Post by Jason L Tibbitts III
What really concerns me is what happens with split DNS. I assume I'll
just need to configure the local resolvers to talk only to my resolvers,
but this would really need to be documented
well and by having shared resolvers in the network in case they are
properly configured spilt DNS won't happen ever - with a local resolver
not *only* forwarding to the LAN resolvers (and then you have not much
gained with the local resolver) it beomces much more likely
Andrew Lutomirski
2015-06-01 18:30:53 UTC
Permalink
Post by Reindl Harald
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
no it is not in case of a serious server setup - period
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache.
Post by Reindl Harald
Post by Jason L Tibbitts III
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
if your *LAN* nameservers are going offline you need to solve that problem
and ask you why....
I would think that avoiding a single point of failure (your LAN
nameserver) would be a *good* thing.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Condu
Reindl Harald
2015-06-01 18:58:07 UTC
Permalink
Post by Andrew Lutomirski
Post by Reindl Harald
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
no it is not in case of a serious server setup - period
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache
i never saw glibc caching any dns response, at least on Fedora, new
subdomains are working from the moment they are provisioned even if they
are tried a few seconds before

on Apple clients you need to flush the local cache

so with setup a dns cache on each and every machine you fuckup your
network because you introduce the same negative TTL caching affecting
OSX clients for years now

no, thanks, not for static configured servers and even not for
workstations running inside a relieable company network

and for "avoiding a single point of failure (your LAN
nameserver)" - in a proper network it don't fail - never
Stephen Gallagher
2015-06-01 19:19:14 UTC
Permalink
Post by Reindl Harald
so with setup a dns cache on each and every machine you fuckup your
network because you introduce the same negative TTL caching affecting
OSX clients for years now
Harald, please moderate your tone. It's unpleasant to the other
participants in this thread. Furthermore, please consider the
possibility that your own opinion on matters may not in fact be the
singular truth that the world refuses to see.

There are many environments besides yours and they have different
needs.
Reindl Harald
2015-06-01 19:25:01 UTC
Permalink
Post by Stephen Gallagher
Post by Reindl Harald
so with setup a dns cache on each and every machine you fuckup your
network because you introduce the same negative TTL caching affecting
OSX clients for years now
Harald, please moderate your tone. It's unpleasant to the other
participants in this thread. Furthermore, please consider the
possibility that your own opinion on matters may not in fact be the
singular truth that the world refuses to see.
There are many environments besides yours and they have different
needs
surely there are many environments beside mine and *that* is why it's
not smart to consider a local dns cache on each and every server and
then take care for security updates on hundrets and thousands of notes
and *most worse* maybe not aware that you need to take care

for me: a nobrainer, i know what is running on my networks and don#t
rely on any defaults at all
Nicolas Mailhot
2015-06-02 08:02:15 UTC
Permalink
Post by Reindl Harald
surely there are many environments beside mine and *that* is why it's
not smart to consider a local dns cache on each and every server
There are many environments that benefit from a local DNS cache (for
example FAI with flacky DNS, people will local cache have perfect service
while others wonder why they are disconnected all the time), it can
implement DNS features third party DNS miss (so apps don't have to deal
with buggy DNSes), and anyone dealing with DNS ops MUST already factor
TTLs in since many systems already use DNS caches.

The braindamaged thing is to deal with DNS warts localy in all apps
instead of using a central system-wide component that can get enough
attention to be solid.

Regards,
--
Nicolas Mailhot
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code o
Petr Spacek
2015-06-03 07:14:49 UTC
Permalink
Post by Andrew Lutomirski
Post by Reindl Harald
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
no it is not in case of a serious server setup - period
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache
i never saw glibc caching any dns response, at least on Fedora, new subdomains
are working from the moment they are provisioned even if they are tried a few
seconds before
on Apple clients you need to flush the local cache
so with setup a dns cache on each and every machine you fuckup your network
because you introduce the same negative TTL caching affecting OSX clients for
years now
Please let me clarify few things:

1) Negative caching is controlled by zone owner. If you are not happy that
OSX/Windows clients cache negative answers for zones your company use - no
problem, set SOA minimum field to 1 second and be done with that.

Please see http://tools.ietf.org/html/rfc2308 for further details.


2) Even if you have setup with site-wide caching resolvers, the responses from
internal zones are cached anyway because all resolvers are not authoritative
for all zones you care about (unless you are on a really small network).

I.e. if the caching is a problem you have the problem even nowadays.

The positive caching is controlled by zone owner, too. If you are worried
about stale data on clients, go and lower TTL to 1 second.


Lowering TTL should work for all clients, no matter if they have local cache
or not, i.e. including Windows/OSX.


Hopefully this shows that problem is not *technically* caused by caching on
clients but by inappropriate TTL settings in zones. As a network
administrator, you have the power to fix that centrally, without a need to
touch every single client.

I hope this helps.
--
Petr Spacek @ Red Hat
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of C
Reindl Harald
2015-06-03 08:58:25 UTC
Permalink
Post by Petr Spacek
so with setup a dns cache on each and every machine you fuckup your network
because you introduce the same negative TTL caching affecting OSX clients for
years now
1) Negative caching is controlled by zone owner. If you are not happy that
OSX/Windows clients cache negative answers for zones your company use - no
problem, set SOA minimum field to 1 second and be done with that.
bad idea when you maintain public nameservers for some hundret domains
just because broken clietn software
Post by Petr Spacek
2) Even if you have setup with site-wide caching resolvers, the responses from
internal zones are cached anyway because all resolvers are not authoritative
for all zones you care about (unless you are on a really small network).
they are and that don't depend on the network size
Post by Petr Spacek
I.e. if the caching is a problem you have the problem even nowadays.
The positive caching is controlled by zone owner, too. If you are worried
about stale data on clients, go and lower TTL to 1 second.
keep your cynicism for yourself

lower a TTL to 1 second is pure stupidity and without broken client
software not needed in a network with authoritative nameservers where
zone data is also shared with *public nameservers*
Post by Petr Spacek
Lowering TTL should work for all clients, no matter if they have local cache
or not, i.e. including Windows/OSX.
lowering TTLs to fix stupid client defaults is not a fix
Post by Petr Spacek
Hopefully this shows that problem is not *technically* caused by caching on
clients but by inappropriate TTL settings in zones. As a network
administrator, you have the power to fix that centrally, without a need to
touch every single client
sorry, but that is complete nonsense
Petr Spacek
2015-06-03 11:39:23 UTC
Permalink
Post by Petr Spacek
so with setup a dns cache on each and every machine you fuckup your network
because you introduce the same negative TTL caching affecting OSX clients for
years now
1) Negative caching is controlled by zone owner. If you are not happy that
OSX/Windows clients cache negative answers for zones your company use - no
problem, set SOA minimum field to 1 second and be done with that.
bad idea when you maintain public nameservers for some hundret domains just
I agree that it is a very bad idea to ignore DNS caching. It was built-in on
purpose.
because broken clietn software
I'm sorry for disappointing you.

The behavior I describe is standard for last ~ 20 years 1987 (RFCs
1034/1035/2308). If you don't agree with standard then you cannot use DNS
technology as standardized. Here I'm not sure if other Fedora users would also
welcome non-standard behavior.

If you feel that the standard is broken then *please* continue with discussion
on IETF's dnsop mailing list:
https://www.ietf.org/mailman/listinfo/dnsop

Thank you for understanding.

Petr^2 Spacek
Post by Petr Spacek
2) Even if you have setup with site-wide caching resolvers, the responses from
internal zones are cached anyway because all resolvers are not authoritative
for all zones you care about (unless you are on a really small network).
they are and that don't depend on the network size
Post by Petr Spacek
I.e. if the caching is a problem you have the problem even nowadays.
The positive caching is controlled by zone owner, too. If you are worried
about stale data on clients, go and lower TTL to 1 second.
keep your cynicism for yourself
lower a TTL to 1 second is pure stupidity and without broken client software
not needed in a network with authoritative nameservers where zone data is also
shared with *public nameservers*
Post by Petr Spacek
Lowering TTL should work for all clients, no matter if they have local cache
or not, i.e. including Windows/OSX.
lowering TTLs to fix stupid client defaults is not a fix
Post by Petr Spacek
Hopefully this shows that problem is not *technically* caused by caching on
clients but by inappropriate TTL settings in zones. As a network
administrator, you have the power to fix that centrally, without a need to
touch every single client
sorry, but that is complete nonsense
--
Petr Spacek @ Red Hat
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedo
Reindl Harald
2015-06-03 11:45:25 UTC
Permalink
Post by Petr Spacek
Post by Petr Spacek
so with setup a dns cache on each and every machine you fuckup your network
because you introduce the same negative TTL caching affecting OSX clients for
years now
1) Negative caching is controlled by zone owner. If you are not happy that
OSX/Windows clients cache negative answers for zones your company use - no
problem, set SOA minimum field to 1 second and be done with that.
bad idea when you maintain public nameservers for some hundret domains just
I agree that it is a very bad idea to ignore DNS caching. It was built-in on
purpose.
because broken clietn software
I'm sorry for disappointing you.
The behavior I describe is standard for last ~ 20 years 1987 (RFCs
1034/1035/2308). If you don't agree with standard then you cannot use DNS
technology as standardized. Here I'm not sure if other Fedora users would also
welcome non-standard behavior.
If you feel that the standard is broken then *please* continue with discussion
https://www.ietf.org/mailman/listinfo/dnsop
come on stop trolling that way because you know exactly what i am
talking about by "broken client software" - the point is that with
caching on each and every device you lose the oppotinity clear central
caches for whatever reason and make the changes visible on all clients
in realtime
Petr Spacek
2015-06-03 12:02:19 UTC
Permalink
Post by Petr Spacek
Post by Petr Spacek
so with setup a dns cache on each and every machine you fuckup your network
because you introduce the same negative TTL caching affecting OSX clients for
years now
1) Negative caching is controlled by zone owner. If you are not happy that
OSX/Windows clients cache negative answers for zones your company use - no
problem, set SOA minimum field to 1 second and be done with that.
bad idea when you maintain public nameservers for some hundret domains just
I agree that it is a very bad idea to ignore DNS caching. It was built-in on
purpose.
because broken clietn software
I'm sorry for disappointing you.
The behavior I describe is standard for last ~ 20 years 1987 (RFCs
1034/1035/2308). If you don't agree with standard then you cannot use DNS
technology as standardized. Here I'm not sure if other Fedora users would also
welcome non-standard behavior.
If you feel that the standard is broken then *please* continue with discussion
https://www.ietf.org/mailman/listinfo/dnsop
come on stop trolling that way because you know exactly what i am talking
about by "broken client software" - the point is that with caching on each and
every device you lose the oppotinity clear central caches for whatever reason
and make the changes visible on all clients in realtime
You will lose the ability because *you configured the zone with
inappropriately long TTL*.

As usual, it is a trade-off: (performance & resiliency) vs. flexibility.

It is up to you as an administrator to decide on which side you want to be.

Also, feel free to contribute with protocol proposal for DNS cache flushing.
dnsop working group already seen few ideas like that and the group is quite
open, contributions are welcome!
--
Petr Spacek @ Red Hat
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Cond
Reindl Harald
2015-06-03 12:07:36 UTC
Permalink
Post by Petr Spacek
Post by Petr Spacek
I'm sorry for disappointing you.
The behavior I describe is standard for last ~ 20 years 1987 (RFCs
1034/1035/2308). If you don't agree with standard then you cannot use DNS
technology as standardized. Here I'm not sure if other Fedora users would also
welcome non-standard behavior.
If you feel that the standard is broken then *please* continue with discussion
https://www.ietf.org/mailman/listinfo/dnsop
come on stop trolling that way because you know exactly what i am talking
about by "broken client software" - the point is that with caching on each and
every device you lose the oppotinity clear central caches for whatever reason
and make the changes visible on all clients in realtime
You will lose the ability because *you configured the zone with
inappropriately long TTL*
no, you lose the ability only when each and every device maintains it's
own cache while TTL is normally meant for resolvers and you don't need
more than *one* trustable and redundant resolver for a whole LAN

with that *one* flush on that resolver would lead in the desired result
for the whole network and you don't need hacks like dns views for the
own LAN with a very low TTL while you don't want that for the rest of
the world
Simo Sorce
2015-06-03 13:54:23 UTC
Permalink
Post by Reindl Harald
Post by Petr Spacek
Post by Petr Spacek
I'm sorry for disappointing you.
The behavior I describe is standard for last ~ 20 years 1987 (RFCs
1034/1035/2308). If you don't agree with standard then you cannot use DNS
technology as standardized. Here I'm not sure if other Fedora users would also
welcome non-standard behavior.
If you feel that the standard is broken then *please* continue with discussion
https://www.ietf.org/mailman/listinfo/dnsop
come on stop trolling that way because you know exactly what i am talking
about by "broken client software" - the point is that with caching on each and
every device you lose the oppotinity clear central caches for whatever reason
and make the changes visible on all clients in realtime
You will lose the ability because *you configured the zone with
inappropriately long TTL*
no, you lose the ability only when each and every device maintains it's
own cache while TTL is normally meant for resolvers and you don't need
more than *one* trustable and redundant resolver for a whole LAN
with that *one* flush on that resolver would lead in the desired result
for the whole network and you don't need hacks like dns views for the
own LAN with a very low TTL while you don't want that for the rest of
the world
Reindl can you stop please ?
You want to use a standards protocol in a way that it was not designed
for. Caching ALL THE WAY DOWN TO CLIENTS is part of the *design* of the
protocol. You want to bend it to do things that convenience you and you
have KNOBS to do that, the TTL levels.
It's really up to you.

What is not up to you is telling someone is a troll when they explain to
you what a standard says. Go read the fine RFCs now and put up (with
proposals in IETF) or shut up please.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora
Paul Wouters
2015-06-03 13:03:02 UTC
Permalink
Post by Petr Spacek
Post by Petr Spacek
If you feel that the standard is broken then *please* continue with discussion
https://www.ietf.org/mailman/listinfo/dnsop
come on stop trolling that way because you know exactly what i am talking
about by "broken client software" - the point is that with caching on each and
every device you lose the oppotinity clear central caches for whatever reason
and make the changes visible on all clients in realtime
You will lose the ability because *you configured the zone with
inappropriately long TTL*.
I have to agree with Petr here. The DNS is specifically designed so that
the producer of records can say how long things are allowed to be
cached. Chaining caches via forwarders is not against the method of the
DNS - it is the core design.

Moving the resolving and validation to the end nodes to increase
security, and DNS security is something we badly need.

Relying on aggregating DNS servers as access control for out-of-band
DNS clearing goes against the "API contract" of a DNS transaction,
which comes with a TTL condition. Plus, that assumption has always
been broken for browsers already, who keep their own cache.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedo
Chris Adams
2015-06-01 19:29:45 UTC
Permalink
Post by Andrew Lutomirski
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache.
glibc doesn't have a cache, except each process caching the settings in
/etc/resolv.conf. That's part of the problem, because there's no way to
cache "first server in resolv.conf is not responding", so each lookup
has to figure that out for itself (many timeouts).

Running a local caching resolver helps in a number of ways. I prefer to
run unbound, forwarding to the local network's preferred resolvers, with
a low cache-max-ttl (like 1-5 minutes, depending on the server). That
smooths out the traffic (keeps from requesting the same thing a bunch in
a short time), but still generally keeps you from having to clear a
bunch of caches in an unplanned change situation.

This really helps with some types of servers, such as mail servers
running spam filtering. They tend to look up the same thing a bunch in
a short period, so caching it locally helps (speeds up local DNS
resolution and keeps from causing load spikes on the network resolvers).

This is in addition to the DNSSEC benefits.
--
Chris Adams <***@cmadams.net>
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproje
Andrew Lutomirski
2015-06-01 19:38:16 UTC
Permalink
Post by Chris Adams
Post by Andrew Lutomirski
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache.
glibc doesn't have a cache, except each process caching the settings in
/etc/resolv.conf. That's part of the problem, because there's no way to
cache "first server in resolv.conf is not responding", so each lookup
has to figure that out for itself (many timeouts).
Glibc caches *something* that enabled the bug I hit. I don't know
exactly what it's trying to cache, but it's certainly stateful.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Cod
Reindl Harald
2015-06-01 19:44:53 UTC
Permalink
Post by Andrew Lutomirski
Post by Chris Adams
Post by Andrew Lutomirski
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache.
glibc doesn't have a cache, except each process caching the settings in
/etc/resolv.conf. That's part of the problem, because there's no way to
cache "first server in resolv.conf is not responding", so each lookup
has to figure that out for itself (many timeouts).
Glibc caches *something* that enabled the bug I hit. I don't know
exactly what it's trying to cache, but it's certainly stateful
it don't cache dns respones - try it out in your local network
*client applications* may cache respones

try it out in your local network

* enter a non existing subdomain in firefox
* add the hostname to your LAN nameserver
* try again: firefox refuses
* restart just firefox
* it resolves without any delay

a) that proves no systemwide cachae
b) it proves with introduce a local systemdwide cache
you introduce a problem not existing before
Simo Sorce
2015-06-02 17:45:03 UTC
Permalink
Post by Reindl Harald
Post by Andrew Lutomirski
Post by Chris Adams
Post by Andrew Lutomirski
I'm with Jason here. Glibc's resolver is amazingly buggy, and things
break randomly and unreproducibly when this happens. A good setup
would have a local resolver and glibc would be configured to cache
nothing whatsoever. Then, if you need to perform maintenance on the
local DNS cache, you can do it by flushing your local resolver rather
than trying to figure out how you're going to persuade every running
program to tell glibc to flush its cache.
glibc doesn't have a cache, except each process caching the settings in
/etc/resolv.conf. That's part of the problem, because there's no way to
cache "first server in resolv.conf is not responding", so each lookup
has to figure that out for itself (many timeouts).
Glibc caches *something* that enabled the bug I hit. I don't know
exactly what it's trying to cache, but it's certainly stateful
it don't cache dns respones - try it out in your local network
*client applications* may cache respones
try it out in your local network
* enter a non existing subdomain in firefox
* add the hostname to your LAN nameserver
* try again: firefox refuses
* restart just firefox
* it resolves without any delay
a) that proves no systemwide cachae
b) it proves with introduce a local systemdwide cache
you introduce a problem not existing before
If you have nscd running glibc caches, so it is a matter of
configuration.

The *only* reason why Firefox caches Names is because we do not have a
local dns caching resolver, so Firefox had to implement its own.

If you had a local caching resolver Firefox could be changed to stop
caching on its own instead. Which would be a plus, I often have way too
many tabs open to consider restarting firefox unless the website with
issues is really important. If i had a local resolver it would be easy
to just flush that one and have FF back in business immediately.

As you see it is a matter of perspective.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://f
Reindl Harald
2015-06-02 17:51:09 UTC
Permalink
Post by Simo Sorce
Post by Reindl Harald
it don't cache dns respones - try it out in your local network
*client applications* may cache respones
try it out in your local network
* enter a non existing subdomain in firefox
* add the hostname to your LAN nameserver
* try again: firefox refuses
* restart just firefox
* it resolves without any delay
a) that proves no systemwide cachae
b) it proves with introduce a local systemdwide cache
you introduce a problem not existing before
If you have nscd running glibc caches, so it is a matter of
configuration.
completly different topic

if i install a local resolver and start it it caches - so what - the
same for nscd which is not default, so you can't blame glibc because
caching of an additional package
Post by Simo Sorce
The *only* reason why Firefox caches Names is because we do not have a
local dns caching resolver, so Firefox had to implement its own.
If you had a local caching resolver Firefox could be changed to stop
caching on its own instead
tell me one reason why *any* application has to cache DNS results at
it's own - it don't matter at all if the machine has a local
resolver/cache or not, it's not the business of any user application

and just because you have a local resolver firefox won't stop it's behavior
Simo Sorce
2015-06-02 18:01:36 UTC
Permalink
Post by Reindl Harald
Post by Simo Sorce
Post by Reindl Harald
it don't cache dns respones - try it out in your local network
*client applications* may cache respones
try it out in your local network
* enter a non existing subdomain in firefox
* add the hostname to your LAN nameserver
* try again: firefox refuses
* restart just firefox
* it resolves without any delay
a) that proves no systemwide cachae
b) it proves with introduce a local systemdwide cache
you introduce a problem not existing before
If you have nscd running glibc caches, so it is a matter of
configuration.
completly different topic
if i install a local resolver and start it it caches - so what - the
same for nscd which is not default, so you can't blame glibc because
caching of an additional package
If you knew what you are talkign about, you would know glibc's
documentation tells you their recommended way to deal with changing
resolv.conf files is to install and use nscd. So, yes, I can totally
blame glibc as nscd is aprt of glibc and they recommend you run it.

It is therefore the same topic.
Post by Reindl Harald
Post by Simo Sorce
The *only* reason why Firefox caches Names is because we do not have a
local dns caching resolver, so Firefox had to implement its own.
If you had a local caching resolver Firefox could be changed to stop
caching on its own instead
tell me one reason why *any* application has to cache DNS results at
it's own - it don't matter at all if the machine has a local
resolver/cache or not, it's not the business of any user application
Because at least user applications need to be quick, and can't give user
a bad experience simply because the local DNS has gone out for lunch
(which happens pretty consistently in home routers and end-user ISP
netowrks), so apps end up doing what they can to avoid getting blamed by
the user, that turns out to be: cache DNS replies.
It doesn't matter whether you like it or not, this is the reality and
we have to cope with reality not our desire of what reality "should" be.

By adding a local caching resolver by default, then apps *by default*
won't see DNS as a problem anymore and will stop implement half-ass-ed
caching. Ultimately leading to the result you want in your case, apps
will stop caching on their own, and when you remove the local resolver
in your setup you'll be happy top observe the flooding of DNS requests
w/o any application caching.
You should be happy about this change I guess :)
Post by Reindl Harald
and just because you have a local resolver firefox won't stop it's behavior
It can, w/o a local resolver FF developers will definitely keep caching
on their own, with a decent local resolver they can allow themselves to
disable their own and go back to rely on the system one, perhaps.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http:
Paul Wouters
2015-06-02 18:36:37 UTC
Permalink
Post by Simo Sorce
Post by Reindl Harald
and just because you have a local resolver firefox won't stop it's behavior
It can, w/o a local resolver FF developers will definitely keep caching
on their own, with a decent local resolver they can allow themselves to
disable their own and go back to rely on the system one, perhaps.
I don't think so. Firefox does that to avoid DNS rebinding attacks.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/co
Florian Weimer
2015-06-03 10:04:45 UTC
Permalink
Post by Paul Wouters
Post by Simo Sorce
Post by Reindl Harald
and just because you have a local resolver firefox won't stop it's behavior
It can, w/o a local resolver FF developers will definitely keep caching
on their own, with a decent local resolver they can allow themselves to
disable their own and go back to rely on the system one, perhaps.
I don't think so. Firefox does that to avoid DNS rebinding attacks.
It is somewhat questionable whether DNS rebinding vulnerabilities are,
in fact, a problem which should be solved at the client side. But
Firefox certainly has some caching mechanisms intended to help against
that (but I'm not sure how reliable they are in preventing the issue,
e.g. if you use a web proxy).
--
Florian Weimer / Red Hat Product Security
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedorap
Petr Spacek
2015-06-03 11:54:35 UTC
Permalink
Post by Florian Weimer
Post by Paul Wouters
Post by Simo Sorce
Post by Reindl Harald
and just because you have a local resolver firefox won't stop it's behavior
It can, w/o a local resolver FF developers will definitely keep caching
on their own, with a decent local resolver they can allow themselves to
disable their own and go back to rely on the system one, perhaps.
I don't think so. Firefox does that to avoid DNS rebinding attacks.
It is somewhat questionable whether DNS rebinding vulnerabilities are,
in fact, a problem which should be solved at the client side. But
Oh yes. DNS pinning in browser is just a band-aid and not proper solution. I
would argue that DNS rebinding attack is caused by generic lack of ingress
filtering on multiple levels.

We learned to filter IP packets on firewalls to make sure that packets with
internal source addresses come really from interfaces connected to internal
networks and the very same principle should apply everywhere...
Post by Florian Weimer
Firefox certainly has some caching mechanisms intended to help against
that (but I'm not sure how reliable they are in preventing the issue,
e.g. if you use a web proxy).
--
Petr Spacek @ Red Hat
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org
Paul Wouters
2015-06-03 13:13:45 UTC
Permalink
Post by Petr Spacek
Post by Florian Weimer
It is somewhat questionable whether DNS rebinding vulnerabilities are,
in fact, a problem which should be solved at the client side. But
Oh yes. DNS pinning in browser is just a band-aid and not proper solution. I
would argue that DNS rebinding attack is caused by generic lack of ingress
filtering on multiple levels.
I'm not sure we are talking about the same thing here? A DNS rebinding
attack works by tricking the browser to resolve something, on which it
makes a security decision, eg nohats.ca is 193.110.157.102. The TTL for
that would be 0. The browser checks this is a valid src/dst pair. Then
it allows this transaction. Another piece of the browser is allowed to
run, usually something like flash with its own DNS code, and it has to
redo the query since the original TTL was 0. Now the remote DNS server
answers with an another IP address that happens to be in your browser's
local network. And now the flash plugin is scanning the local network
against the actual browser's security policy.

Forcing a minimal TTL prevents this, but I think browsers have gone
overboard on ignoring "bad DNS TTLs" for the sake of speed.

Note that if you are running unbound as the DNS server, you can
configure it to not allow RFC1918 space for all but white listed
domains, but this is hard to administrate, and prone to failure when
using split DNS. Although ISP caches could surely enable the feature
that no RFC1918 may ever appear in public DNS answers.
Post by Petr Spacek
We learned to filter IP packets on firewalls to make sure that packets with
internal source addresses come really from interfaces connected to internal
networks and the very same principle should apply everywhere...
It's hard to do without knowing what kind of network you are on. Can it
be trusted or not? If you are at a coffeeshop and starbucks.com resolves
to 192.168.1.1, is that trusted or malicious? If you are on a wifi
network, should your laptop allow DNS answers for www.redhat.com to be
192.168.1.1 ? It's a hard problem if you try to solve it without
whitelists and blacklists, which are in itself not a very good solution.

This problem is similar to the network join problem itself. Is this wifi
network trusted? Since coffeeshops use WPA with passwords scribbled on
the whiteboard, we have no other way than asking the user.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://f
Reindl Harald
2015-06-01 19:31:14 UTC
Permalink
Post by Andrew Lutomirski
I would think that avoiding a single point of failure (your LAN
nameserver) would be a *good* thing
and your holy one and only resolver on localhost is not a single point
of failure? in fact it would take much longer to recognize a failing and
exclusive local resolver on 2 out of 1000 servers why it gets visible
from the first second if your central nameservers have problems

and BTW glibc has no problem with the first nameserver in
/etc/resolv.conf failing as long as the slave responds, it may take a
little time but that don't matter as long as we are not talking about a
incoming mail exchanger
Chuck Anderson
2015-06-02 14:55:09 UTC
Permalink
Post by Reindl Harald
Post by Andrew Lutomirski
I would think that avoiding a single point of failure (your LAN
nameserver) would be a *good* thing
and your holy one and only resolver on localhost is not a single
point of failure? in fact it would take much longer to recognize a
failing and exclusive local resolver on 2 out of 1000 servers why it
gets visible from the first second if your central nameservers have
problems
and BTW glibc has no problem with the first nameserver in
/etc/resolv.conf failing as long as the slave responds, it may take
a little time but that don't matter as long as we are not talking
about a incoming mail exchanger
I'm sorry, but saying that "it may take a little time" is a
non-starter. For anyone who says this, I challenge you to set your
system's resolv.conf so that the first listed nameserver is a
completely offline IP address, and the second/third listed ones are
your normal nameservers. Note that the first one must be completely
offline, not an IP that is "up" but just doesn't have a listening
nameserver, but an IP that is non-existent on your local network.
E.g. if your local network is 192.168.1.0/24, set it to
192.168.1.<unassigned-to-any-host>. Make sure you can't ping the
unassigned IP.

There are many services that will choke in this sort of configuration.
Not just mail servers, but RADIUS servers, LDAP servers, Samba
servers, web servers depending on the configuration, SSH servers and
clients, etc. Sure, if you test everything in this exact failure
scenario, you may be able to work around this problem (e.g. turn off
reverse DNS lookups in Apache and sshd, etc.) but if you run a LAN or
data center with many different groups maintaining different systems,
you can't guarantee that everyone has done this sort of rigorous
testing and configurations to avoid problems, if it is even possible
for some services which it may not be.

Of course a localhost resolver is also a single point of failure. But
the important property is that it is very much FATE SHARED with the
rest of the system. So when you reboot the system to apply a security
update, it doesn't matter that the localhost resolver is offline,
because the services on that box are offline too.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Co
Simo Sorce
2015-06-02 17:38:53 UTC
Permalink
Post by Reindl Harald
Post by Andrew Lutomirski
I would think that avoiding a single point of failure (your LAN
nameserver) would be a *good* thing
and your holy one and only resolver on localhost is not a single point
of failure?
No more than glibc, or any other component you have.
Post by Reindl Harald
in fact it would take much longer to recognize a failing and
exclusive local resolver on 2 out of 1000 servers why it gets visible
from the first second if your central nameservers have problems
This is orthogonal to the problem being solved.
Post by Reindl Harald
and BTW glibc has no problem with the first nameserver in
/etc/resolv.conf failing as long as the slave responds, it may take a
little time but that don't matter as long as we are not talking about a
incoming mail exchanger
Yes there are situation where it doesn't matter ... and there are
situation where it does. A local resolver has many advantages and very
few disadvantages for the *general* case.

Take it easy, it is not the end of the world.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Con
Ryan S. Brown
2015-06-01 19:25:18 UTC
Permalink
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
I don't think it's essential for either the server or the cloud product.
Servers run in a much more reliable network than your average SOHO or
coffee shop setup, and their behavior with regard to DNS doesn't need a
local caching resolver. LAN DNS (probably with split horizon for
DC-internal services) is plenty fast and reliable, there isn't a need to
run a zillion instances of Unbound.

Also, I've run redundant LAN DNS servers in fairly large deployments,
and ns1 going down certainly hasn't "screwed" my hosts.
Post by Jason L Tibbitts III
Of course, most folks who have enough infrastructure to have their own
DNS servers and such can easily figure out how to configure a local
resolver if need be, so what's in the default setup really makes no
difference. And for the home user who might want to grab the server
spin/product/whatever-we're-calling-it-this-week, well, I'd think they'd
want the local resolver.
I don't think so -- when I pull a fresh server image I expect there to
be very little running on it.

A local DNS resolver would certainly be a surprise to me. Again, this
comes back to the expectation that a server isn't hopping networks or
running somewhere un-trusted where there's a high risk of bad actors.
Post by Jason L Tibbitts III
What really concerns me is what happens with split DNS. I assume I'll
just need to configure the local resolvers to talk only to my resolvers,
but this would really need to be documented.
--
Ryan Brown / Software Engineer, Openstack / Red Hat, Inc.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora
Andrew Lutomirski
2015-06-01 19:28:25 UTC
Permalink
Post by Ryan S. Brown
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
I don't think it's essential for either the server or the cloud product.
Servers run in a much more reliable network than your average SOHO or
coffee shop setup, and their behavior with regard to DNS doesn't need a
local caching resolver. LAN DNS (probably with split horizon for
DC-internal services) is plenty fast and reliable, there isn't a need to
run a zillion instances of Unbound.
I agree it's not essential for a server, but it can be quite helpful
to work around glibc bugs. For example, I've hit
https://sourceware.org/bugzilla/show_bug.cgi?id=17802 several times in
production. Yes, that's a glibc bug, and glibc should fix it.
Nonetheless, bugs like that wouldn't matter as much if there were a
local resolver.
Post by Ryan S. Brown
I don't think so -- when I pull a fresh server image I expect there to
be very little running on it.
A local DNS resolver would certainly be a surprise to me. Again, this
comes back to the expectation that a server isn't hopping networks or
running somewhere un-trusted where there's a high risk of bad actors.
It's not just bad actors. Sometimes things break or you need to
reconfigure your upstream resolvers. With a local caching resolver,
this Just Works (tm). With the status quo, it requires restarting
everything.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/c
Reindl Harald
2015-06-01 19:33:59 UTC
Permalink
Post by Andrew Lutomirski
Post by Ryan S. Brown
A local DNS resolver would certainly be a surprise to me. Again, this
comes back to the expectation that a server isn't hopping networks or
running somewhere un-trusted where there's a high risk of bad actors.
It's not just bad actors. Sometimes things break or you need to
reconfigure your upstream resolvers. With a local caching resolver,
this Just Works (tm). With the status quo, it requires restarting
everything
WHAT - the opposite is true, glibc don't cache nameserver respones and
*now* if you change something on your central resolvers it gets visible
on any machine in your network

with having a local cache on 1000 nodes *then* it requires restarting
everyting - so exactly the opposite you are saying
Florian Weimer
2015-06-02 09:39:06 UTC
Permalink
Post by Reindl Harald
Post by Andrew Lutomirski
Post by Ryan S. Brown
A local DNS resolver would certainly be a surprise to me. Again, this
comes back to the expectation that a server isn't hopping networks or
running somewhere un-trusted where there's a high risk of bad actors.
It's not just bad actors. Sometimes things break or you need to
reconfigure your upstream resolvers. With a local caching resolver,
this Just Works (tm). With the status quo, it requires restarting
everything
WHAT - the opposite is true,
Andrew is right, glibc caches the name server *settings*
(/etc/resolv.conf contents), but not the responses received.

The recommended workaround is to use nscd, but this has issues of its own.
--
Florian Weimer / Red Hat Product Security
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code o
Simo Sorce
2015-06-02 17:41:39 UTC
Permalink
Post by Reindl Harald
Post by Andrew Lutomirski
Post by Ryan S. Brown
A local DNS resolver would certainly be a surprise to me. Again, this
comes back to the expectation that a server isn't hopping networks or
running somewhere un-trusted where there's a high risk of bad actors.
It's not just bad actors. Sometimes things break or you need to
reconfigure your upstream resolvers. With a local caching resolver,
this Just Works (tm). With the status quo, it requires restarting
everything
WHAT - the opposite is true, glibc don't cache nameserver respones and
*now* if you change something on your central resolvers it gets visible
on any machine in your network
with having a local cache on 1000 nodes *then* it requires restarting
everyting - so exactly the opposite you are saying
You are assuming a specific configuration where the local resolver
caches for the full ttl period and also caches negative hits. That's not
necessarily true.

With a caching period that does not exceed the ttl (but usually much
shorter) for positive resolution and very short caching for negative
results you would experience very little "latency" and generally not see
any impact.

Stop assuming how it works, and ask first, please.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraprojec
drago01
2015-06-01 20:42:29 UTC
Permalink
Post by Andrew Lutomirski
Post by Ryan S. Brown
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
I don't think it's essential for either the server or the cloud product.
Servers run in a much more reliable network than your average SOHO or
coffee shop setup, and their behavior with regard to DNS doesn't need a
local caching resolver. LAN DNS (probably with split horizon for
DC-internal services) is plenty fast and reliable, there isn't a need to
run a zillion instances of Unbound.
I agree it's not essential for a server, but it can be quite helpful
to work around glibc bugs. For example, I've hit
https://sourceware.org/bugzilla/show_bug.cgi?id=17802 several times in
production. Yes, that's a glibc bug, and glibc should fix it.
Nonetheless, bugs like that wouldn't matter as much if there were a
local resolver.
That's not how bugs should be dealt with ... if there is a bug it
should be fixed where it is not duct taped this way.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: h
Paul Wouters
2015-06-01 20:56:48 UTC
Permalink
Post by drago01
Post by Andrew Lutomirski
production. Yes, that's a glibc bug, and glibc should fix it.
Nonetheless, bugs like that wouldn't matter as much if there were a
local resolver.
That's not how bugs should be dealt with ... if there is a bug it
should be fixed where it is not duct taped this way.
I look forward to your proposal to POSIX and its acceptance :)

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Con
Andrew Lutomirski
2015-06-01 20:57:19 UTC
Permalink
Post by drago01
Post by Andrew Lutomirski
Post by Ryan S. Brown
Post by Jason L Tibbitts III
RSB> I disagree; for server & cloud deployments it doesn't make sense to
RSB> duplicate a DNS server on *every* host, and if you care about
RSB> DNSSEC you likely already run a trusted resolver.
I disagree generally in the case of server deployments.
Having a local caching resolver is pretty much essential, even though we
all know it's just a workaround for glibc.
Basically, if you have properly functioning DNS on multiple local
servers but not having anything fancier like heartbeat-based IP handoff
or a load balancing appliance or something, and the first resolver in
resolv.conf goes offline, your hosts are screwed. glibc's resolver code
is simply horrible. This is completely exclusive of DNSSEC issues.
I don't think it's essential for either the server or the cloud product.
Servers run in a much more reliable network than your average SOHO or
coffee shop setup, and their behavior with regard to DNS doesn't need a
local caching resolver. LAN DNS (probably with split horizon for
DC-internal services) is plenty fast and reliable, there isn't a need to
run a zillion instances of Unbound.
I agree it's not essential for a server, but it can be quite helpful
to work around glibc bugs. For example, I've hit
https://sourceware.org/bugzilla/show_bug.cgi?id=17802 several times in
production. Yes, that's a glibc bug, and glibc should fix it.
Nonetheless, bugs like that wouldn't matter as much if there were a
local resolver.
That's not how bugs should be dealt with ... if there is a bug it
should be fixed where it is not duct taped this way.
This is glibc we're talking about, though. Have you tried to get a
glibc bug fixed? It's not a pleasant experience.

For example, the bug I reported has a candidate patch. That patch
isn't applied, and the patch looks like the bug might be a security
issue. It's been in that state for months. This is not unusual for
glibc.

Anyway, even on a LAN, the overhead of a network round trip per
cacheable DNS query may be non-negligable for some use cases. A local
caching resolver fixes that, too.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedora
Reindl Harald
2015-06-01 21:15:18 UTC
Permalink
Post by Andrew Lutomirski
Post by drago01
Post by Andrew Lutomirski
I agree it's not essential for a server, but it can be quite helpful
to work around glibc bugs. For example, I've hit
https://sourceware.org/bugzilla/show_bug.cgi?id=17802 several times in
production. Yes, that's a glibc bug, and glibc should fix it.
Nonetheless, bugs like that wouldn't matter as much if there were a
local resolver.
That's not how bugs should be dealt with ... if there is a bug it
should be fixed where it is not duct taped this way.
This is glibc we're talking about, though. Have you tried to get a
glibc bug fixed? It's not a pleasant experience.
and hence you prefer put your head in the sand and burry it by anotehr
layer increasing complexity?
Post by Andrew Lutomirski
For example, the bug I reported has a candidate patch. That patch
isn't applied, and the patch looks like the bug might be a security
issue. It's been in that state for months. This is not unusual for
glibc.
that don't justify a local resolver on hundrets of servers as default to
hide a bug somewhere else and frankly if that problem would affect
naybody i would have faced it in the past 10 years but i did not
Post by Andrew Lutomirski
Anyway, even on a LAN, the overhead of a network round trip per
cacheable DNS query may be non-negligable for some use cases. A local
caching resolver fixes that, too
and here you go: because a change is fine for *some use cases* is not a
justification for introduce an *additional* layer for a default setup

the strategy wrap layers over layers to mask something going wrong in
the 5 layers below and if there is a problem somewhere two weeks later
wrap anotehr 2 layers on top to mask the current outbreak in the
"modern" system development is nothing else than stupidty resulting in
each years complexer systems where *nobody* knows what they are doing
and which component is really responsible if things are going wrong

congratulations - maybe if we follow that paradigms often and fast
enough the whole ecosystem is ruined in a non reveratble way that it
needs to rebuilt from scratch instead rwap anotehr 10 layers with 9
minor problems around

a sane system should be as simple as possible so that *one* human is
able to determine what is happening without hire 10 specialists for each
layer

in short: leave me in peace with defaults raising complexity more and
more, i have enough with dbus, a now essential service which cant be
restarted after updates of underlying libraries while it was no problem
over many years to type "chkconfig messagebus off" on servers and have
no single process except the services you installed and configured
Simo Sorce
2015-06-02 17:49:46 UTC
Permalink
Post by Reindl Harald
a sane system should be as simple as possible so that *one* human is
able to determine what is happening without hire 10 specialists for each
layer
There is no human able to understand a complex system like modern
computers and OSs, it is just an illusion. But we can improve user's
lives better by providing defaults that make the system work better in
the general case and leave to specialists with special needs to tweak
the system or remove unwanted layers.
Post by Reindl Harald
in short: leave me in peace with defaults raising complexity more and
more, i have enough with dbus, a now essential service which cant be
restarted after updates of underlying libraries while it was no problem
over many years to type "chkconfig messagebus off" on servers and have
no single process except the services you installed and configured
You are free to keep using your kickstart files, nobody is going to mess
with those. you already have many other "special" needs apparently, so
can you stop getting mad whenever there is *any* change ?

The world is not static, it keeps changing and we can adapt or die.
We, as a community, are adapting, but you as an individual are free to
diverge with your personal configuration.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproje
Reindl Harald
2015-06-02 17:56:21 UTC
Permalink
Post by Simo Sorce
Post by Reindl Harald
a sane system should be as simple as possible so that *one* human is
able to determine what is happening without hire 10 specialists for each
layer
There is no human able to understand a complex system like modern
computers and OSs, it is just an illusion
*because* more and more complexity is added with each release and then
another layer of complexity is added in the next release to mask the impact

on a stripped down Fedora 9 system the output of htop even on a notebook
did fit on a screen without scrolling

the whole purpose of Linux systems was to have open systems, open means
also basically understandable and not just "you can grab the source"
Solomon Peachy
2015-06-02 18:04:20 UTC
Permalink
the whole purpose of Linux systems was to have open systems, open means also
basically understandable and not just "you can grab the source"
Linux is not, and has never been, UNIX.

- Solomon
--
Solomon Peachy pizza at shaftnet dot org
Delray Beach, FL ^^ (email/xmpp) ^^
Quidquid latine dictum sit, altum viditur.
Reindl Harald
2015-06-02 18:12:23 UTC
Permalink
Post by Solomon Peachy
the whole purpose of Linux systems was to have open systems, open means also
basically understandable and not just "you can grab the source"
Linux is not, and has never been, UNIX
your phrase has nothing to with the paragraph you quoted

the main idea auf GNU (you know what GNU means?) is a fully open and
controllable system which is defeated by adding more and more complexity
in default installs

again: if somebody wants the behavior of OSX he can go out and by an
Apple machine - guess why i did not switch to Apple from Windows - just
because i don't like the Apple philosophy
Solomon Peachy
2015-06-02 18:37:51 UTC
Permalink
Post by Reindl Harald
Post by Solomon Peachy
the whole purpose of Linux systems was to have open systems, open means also
basically understandable and not just "you can grab the source"
Linux is not, and has never been, UNIX
your phrase has nothing to with the paragraph you quoted
UNIX's primary goal was ease (and understandability) of implementation,
even at the expense of performance or capability.
Post by Reindl Harald
the main idea auf GNU (you know what GNU means?) is a fully open and
controllable system which is defeated by adding more and more complexity in
default installs
GNU's Not Unix -- And, I might add, Linux isn't GNU.

"Complexity" has nothing to do with openness or controllability. Those
exist on different axes, so please stop conflating them.
Post by Reindl Harald
again: if somebody wants the behavior of OSX he can go out and by an Apple
machine - guess why i did not switch to Apple from Windows - just because i
don't like the Apple philosophy
Who said anything about OSX or Windows? Please, stick to the subject at
hand.

- Solomon
--
Solomon Peachy pizza at shaftnet dot org
Delray Beach, FL ^^ (email/xmpp) ^^
Quidquid latine dictum sit, altum viditur.
Simo Sorce
2015-06-02 18:11:34 UTC
Permalink
Post by Reindl Harald
Post by Simo Sorce
Post by Reindl Harald
a sane system should be as simple as possible so that *one* human is
able to determine what is happening without hire 10 specialists for each
layer
There is no human able to understand a complex system like modern
computers and OSs, it is just an illusion
*because* more and more complexity is added with each release and then
another layer of complexity is added in the next release to mask the impact
on a stripped down Fedora 9 system the output of htop even on a notebook
did fit on a screen without scrolling
the whole purpose of Linux systems was to have open systems, open means
also basically understandable and not just "you can grab the source"
It would be nice if we were able to address complex problem always with
simple solutions, but we are humans, and we are not generally capable to
do that at the complexity level of a modern computer.

The solution proposed in this thread addresses real problems and real
pain.
For the workstation product it is really a no-brainer, especially when
installed on laptops.
For server it also have notable advantages as others have eloquently
illustrated.

Then there are a few corner cases where things can go south.

All considered the people that care for the Workstation and Server
product generally thinks the advantages *greatly* outweigh the
disadvantages in most situations and so a local resolver is seen as a
good idea to have enabled by default.

Not everyone can be pleased, and your points have actually already been
discussed and pondered before multiple times, I've seen nothing new in
your last messages.

Can we please get productive and bring up new data, if any, or stop
assaulting the list with "I do not like it!" kind of messages. We got
that you do not like it, but we do not need to turn everything you do
not like into a tragedy of the commons. Take a step back and put this
into perspective please, and let people that specialize in DNS related
matters do their job and trust their judgment once they explained to you
the reasons why a change has been proposed.

Simo.
--
Simo Sorce * Red Hat, Inc * New York
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedora
Florian Weimer
2015-06-02 09:44:03 UTC
Permalink
Post by Andrew Lutomirski
This is glibc we're talking about, though. Have you tried to get a
glibc bug fixed? It's not a pleasant experience.
It is possible, but it requires effort. Admittedly, sometimes that
effort appears disproportionate to what is being fixed.

In this particularly case, only *very* few people are familiar with
resolv/, and test coverage for that part is extremely poor.
Post by Andrew Lutomirski
For example, the bug I reported has a candidate patch. That patch
isn't applied, and the patch looks like the bug might be a security
issue. It's been in that state for months. This is not unusual for
glibc.
Can you explain why you think it is a security issue?

In any case, the impact from accidentally triggering this bug seems more
severe.
Post by Andrew Lutomirski
Anyway, even on a LAN, the overhead of a network round trip per
cacheable DNS query may be non-negligable for some use cases. A local
caching resolver fixes that, too.
Right, and it isolates resolvers from the impact of buggy application
which enter an infinite loop if a service becomes unavailable (i.e.,
they do a new DNS lookup for each refused TCP connection).
--
Florian Weimer / Red Hat Product Security
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-o
Andrew Lutomirski
2015-06-02 16:24:20 UTC
Permalink
Post by Florian Weimer
Post by Andrew Lutomirski
This is glibc we're talking about, though. Have you tried to get a
glibc bug fixed? It's not a pleasant experience.
It is possible, but it requires effort. Admittedly, sometimes that
effort appears disproportionate to what is being fixed.
In this particularly case, only *very* few people are familiar with
resolv/, and test coverage for that part is extremely poor.
Post by Andrew Lutomirski
For example, the bug I reported has a candidate patch. That patch
isn't applied, and the patch looks like the bug might be a security
issue. It's been in that state for months. This is not unusual for
glibc.
Can you explain why you think it is a security issue?
I don't have any very specific reason, but it's a load from an array
with the entirely wrong index, and the code is inscrutable. I don't
know whether n is attacker-controlled.

As a mitigating factor, it's a load, so it's probably not so terrible.

Regardless, this seems like a bug wrangling failure. The fix was
committed AFAICT, but no one updated the bug.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fe
Björn Persson
2015-06-02 08:50:42 UTC
Permalink
Post by Ryan S. Brown
I disagree; for server & cloud deployments it doesn't make sense to
duplicate a DNS server on *every* host, and if you care about DNSSEC
you likely already run a trusted resolver.
The trust and management models for Server are fundamentally different
from those of Workstation, since servers don't usually get tossed in a
backpack and put on potentially-hostile coffee shop wi-fi. They also
generally try to run fewer services than a workstation. The datacenter
network is generally trusted, and a shared DNSSEC resolver makes way
more sense.
It may be "beneficial" from a security PoV to have DNSSEC resolution,
but it isn't beneficial to have to patch 1 million copies of unbound
if a vuln is discovered instead of just a few shared resolvers for the
whole DC.
Servers don't only exist in big datacenters where everything is managed
by the same team of sysadmins. There are countless servers in homes and
small offices around the world, connected to all sorts of more or less
trustworthy networks. Some of my current customers have a single server
in a collocation facility somewhere. Everything outside of the Ethernet
port is managed by other people and shouldn't be trusted any more than
necessary. In one of my previous jobs we had servers at multiple
geographically separate collocation sites. At each site we'd rent a
quarter-height rack with locked doors and install some five or so
servers. The network inside the rack was trusted. Beyond the doors was
the Internet. Installing redundant dedicated DNS resolvers at each site
would have been overkill. The DNS servers we had were authoritative
servers for our own domain. If we'd had DNSsec back then it would have
made a lot of sense to validate locally on each server.

For small offices and home users every little thing that needs to be
configured is an additional burden, and chances are that they won't get
around to learning how to configure a local validating resolver if it's
not there by default. Big data centers, on the other hand, will have
automated routines for installing new servers without configuring each
one individually. If they choose to delegate the validation to a set of
trusted DNS servers, then they can easily configure that in whatever
central configuration tool they use, and be done with it.

I'll refrain from saying anything about clouds and containers, but for
the Server product, like for Workstation, common sense suggests that the
default installation should assume as little as possible about its
surroundings. It should definitely not assume that there won't ever be
any adversaries in the local network when it doesn't know anything about
the local network. There should therefore be a local validating DNS
resolver by default, and good documentation on how to replace it with
trusted external resolvers for those who want to do that.

Björn Persson
David Howells
2015-06-02 14:58:24 UTC
Permalink
Post by Jan Kurik
Install a local DNS resolver trusted for the DNSSEC validation running on
127.0.0.1:53. This must be the only name server entry in /etc/resolv.conf.
The automatic name server entries received via dhcp/vpn/wireless
configurations should be stored separately (e.g. this is stored in the
NetworkManager internal state), as transitory name servers to be used by the
trusted local resolver. In all cases, DNSSEC validation will be done
locally.
How does this interact with dnsmasq which also wants to be the only name
server entry in resolv.conf?

David
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http:
Paul Wouters
2015-06-02 16:44:00 UTC
Permalink
Post by David Howells
Post by Jan Kurik
Install a local DNS resolver trusted for the DNSSEC validation running on
127.0.0.1:53. This must be the only name server entry in /etc/resolv.conf.
The automatic name server entries received via dhcp/vpn/wireless
configurations should be stored separately (e.g. this is stored in the
NetworkManager internal state), as transitory name servers to be used by the
trusted local resolver. In all cases, DNSSEC validation will be done
locally.
How does this interact with dnsmasq which also wants to be the only name
server entry in resolv.conf?
Not well? The problem is dnsmasq is not as feature complete as unbound
(and its dnssec implementation is very new).

I think most people end up running dnsmasq because of KVM/libvirtd ? I
think those dnsmasq's should be run in "dhcp only" mode and point to
the hosts's unbound.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Cond
Tomas Hozza
2015-06-02 17:57:39 UTC
Permalink
Post by Paul Wouters
Post by David Howells
Post by Jan Kurik
Install a local DNS resolver trusted for the DNSSEC validation running on
127.0.0.1:53. This must be the only name server entry in
/etc/resolv.conf.
The automatic name server entries received via dhcp/vpn/wireless
configurations should be stored separately (e.g. this is stored in the
NetworkManager internal state), as transitory name servers to be used by the
trusted local resolver. In all cases, DNSSEC validation will be done
locally.
How does this interact with dnsmasq which also wants to be the only name
server entry in resolv.conf?
dnsmasq is not the default entry in /etc/resolv.conf. It can be used
with NM, but unbound can be, too. dnsmasq was integrated with NM sooner,
since it didn't have DNSSEC support, which made a lot of corner cases
and issues basically non-existing.

Unbound it relatively simple and single purpose DNS resolver that was
designed with DNSSEC in mind from the beginning... in comparison to
dnsmasq. dnsmasq is a Swiss knife that is good for simple solutions
hacked together with single component (since it supports DHCPv4/6, TFPT
and also DNS+DNSSEC).
Post by Paul Wouters
Not well? The problem is dnsmasq is not as feature complete as unbound
(and its dnssec implementation is very new).
I agree, and as a previous maintainer of dnsmasq, I think unbound is
better option. Although dnsmasq has a simple DBus API, it is mostly for
DHCP. Also unbound has modular design and easy interface
(unbound-control) enabling to reconfigure it dynamically.
Post by Paul Wouters
I think most people end up running dnsmasq because of KVM/libvirtd ? I
think those dnsmasq's should be run in "dhcp only" mode and point to
the hosts's unbound.
Right. dnsmasq run by libvirtd uses the default configuration WRT
resolv.conf. So it uses the servers from resolv.conf for resolution ->
which will be unbound. There are not conflicts between unbound running
as local resolver and dnsmasq instances run by libvirtd.

Tomas
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Co
David Howells
2015-06-02 20:47:26 UTC
Permalink
Post by Paul Wouters
I think most people end up running dnsmasq because of KVM/libvirtd ? I
think those dnsmasq's should be run in "dhcp only" mode and point to
the hosts's unbound.
I'm using dnsmasq to look up *.redhat.com addresses over VPN whilst looking up
other addresses from my ISP.

David
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: htt
Paul Wouters
2015-06-02 21:09:59 UTC
Permalink
Post by David Howells
I'm using dnsmasq to look up *.redhat.com addresses over VPN whilst looking up
other addresses from my ISP.
That is automatically handled for you if you use libreswan for your
VPN and unbound is running. It will add a forward for the domain
("redhat.com") received over the VPN to the received IP addresses of
the nameservers. I've been running like that for years now.

It even flushes the cache and request queue for related entries when
you bring the VPN up and down, so things like bugzilla.redhat.com will
work on the external IP or internal IP without you needing to do a
thing.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/co
Vít Ondruch
2015-06-09 06:52:43 UTC
Permalink
Post by Jan Kurik
= Proposed System Wide Change: Default Local DNS Resolver =
https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
The "How To Test" section now contains a lot of steps such as "configure
NM", "enable/disable service", but when there will be something which I
can get working just by something like "dnf install localsdnsresolver".
I hope that I won't need to do this steps manually after F23
installation, otherwise it could be hardly called "default". So when
there will be available final version which does not need any additional
configuration available for testing?


Vít
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org
P J P
2015-06-09 11:18:07 UTC
Permalink
Hello Vit,
Post by Vít Ondruch
I hope that I won't need to do this steps manually after F23
installation, otherwise it could be hardly called "default". So when
there will be available final version which does not need any additional
configuration available for testing?
As per F23 schedule, it's post 28 Jul 2015
-> https://fedoraproject.org/wiki/Releases/23/Schedule

---
Regards
-P J P
http://feedmug.com
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-
Vít Ondruch
2015-06-09 11:23:22 UTC
Permalink
Post by P J P
Hello Vit,
Post by Vít Ondruch
I hope that I won't need to do this steps manually after F23
installation, otherwise it could be hardly called "default". So when
there will be available final version which does not need any additional
configuration available for testing?
As per F23 schedule, it's post 28 Jul 2015
-> https://fedoraproject.org/wiki/Releases/23/Schedule
That is the latst possible date when it should be definitely available.
I can't see any reason why it should not be possible immediately, be it
Copr build if you have some reasons not to push it into Rawhide.


Vít
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of
Matthew Miller
2015-06-09 12:51:33 UTC
Permalink
Post by Vít Ondruch
Post by P J P
As per F23 schedule, it's post 28 Jul 2015
-> https://fedoraproject.org/wiki/Releases/23/Schedule
That is the latst possible date when it should be definitely available.
I can't see any reason why it should not be possible immediately, be it
Copr build if you have some reasons not to push it into Rawhide.
It shouldn't be pushed into rawhide before FESCo approval. but that's
on the docket for tomorrow: https://fedorahosted.org/fesco/ticket/1447

Once approved, and assuming it can just happen seamlessly, yeah, it
should happen in rawhide.

Unless I'm missing something, rather than adding `dns=unbound` to
/etc/NetworkManager/NetworkManager.conf, that line could be added to an
/etc/NetworkManager/conf.d/30-dnssec-trigger-unbound.conf file owned by
the dnssec-trigger package, right? Additionally, dnssec-trigger would
be enabled by default. And that all seems fairly seamless to me —
except in cases where it conflicts with an existing configuration.

One (new!) thing I'm concerned with, now that I've enabled it on my
system, is the persistant tray notification. This is... confusing and
ugly. Can we (for F23 if possible, and F24 if not) get better GNOME
Shell integration here?

I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.
Paul Wouters
2015-06-09 15:34:39 UTC
Permalink
Post by Matthew Miller
One (new!) thing I'm concerned with, now that I've enabled it on my
system, is the persistant tray notification. This is... confusing and
ugly. Can we (for F23 if possible, and F24 if not) get better GNOME
Shell integration here?
That's been on the TODO list for years, but it seems the hotspot
detection mechanisms are not converging. The DNS interception and the
HTTP interception really have to be handled together and an informed
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.

Ideally, the dnssec-trigger DNS checks would be merged into the native
hotspot testing.
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.o
Matthew Miller
2015-06-09 16:30:15 UTC
Permalink
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of
Dan Williams
2015-06-11 20:48:54 UTC
Permalink
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.

While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator. There is nothing
wrong with having DNSSEC enabled and part of the portal detection
scheme, but the UI handling portals is clearly a desktop-specific
decision. So whatever we need to do in NM to enable the workflow that
desktops need is what we'll end up doing... Ideally the process goes
like this when unbound/dnssec-trigger are installed:

1. NM connects to a new network
2. NM updates DNS information
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?

Dan
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code
Andrew Lutomirski
2015-06-11 21:41:33 UTC
Permalink
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
I think that part of the problem is that there are too many
implementations of captive portal detection and too many
half-thought-out implementations of what do do if a captive portal is
detected.

I think that, on a well-functioning system, if I connect to a wireless
network, something should detect if I'm behind a captive portal. If
so, I should get a stateless browser that clearly indicates that it's
a captive portal browser, probably lives in a sandbox, and sees the
raw view of the network (no local DNSSEC validation). We have network
namespaces -- the browser part is doable even in a scenario where we
wouldn't want to expose the incorrect view of DNS or some other aspect
of the network to normal applications. (Heck, on a configuration
where we want to use a VPN over untrusted wireless, we could avoid
exposing the untrusted wireless network to applications other than
captive portal login at all.)

Please note that the current GNOME captive portal mechanism is
blatantly insecure, and I've already filed a bug report with no
resolution. I'm not disclosing a subtle 0-day here -- the insecurity
is fairly obvious. I'll probably post to oss-security soon, but
that's a somewhat separate topic.

Once we determine that there's no captive portal or that we've logged
in to it, we should validate DNSSEC and otherwise behave sensibly. If
the network is screwed up enough that normal DNSSEC can't get through
the DHCP-provided resolver (which happens -- I've seen ISPs that
tamper with DNS results for www.google.com), then we should tunnel
around it. IIRC dnssec-triggerd already supports this.
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator. There is nothing
wrong with having DNSSEC enabled and part of the portal detection
scheme, but the UI handling portals is clearly a desktop-specific
decision.
This hasn't worked so well in the past. Back when NM provided its own
UI, that UI tended to work. These days I frequently notice the
gnome-shell UI for networking, bluetooth, etc missing features that
are supported by the backend or just straight-up not working.
Post by Dan Williams
So whatever we need to do in NM to enable the workflow that
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
2. NM updates DNS information
Updates what information to what? resolv.conf should be more or less invariant.
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
I smell a turf war, sadly.

I don't think that "local unbound using dnssec-triggerd isn't NM"
should be a show-stopper. I'd like the see the result work correctly,
and, honestly, if gnome-shell isn't part of the solution, then that's
probably a good thing. Clearly it should be integrated enough with NM
to be aware of connectivity changes, and it should probably integrate
more deeply than that, but if the end solution bypasses NM's captive
portal detection entirely, that doesn't seem like an a priori problem
to me.

Merging all of dnssec-triggerd into NM might be a decent solution.

Keep in mind that a good DNSSEC solution is fundamentally tied to
captive portal login. There are probably many captive portals behind
which it's impossible to get DNSSEC before login but where, once
logged in, tunneled or even normal DNSSEC is possible.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Cod
Dan Williams
2015-06-12 17:33:54 UTC
Permalink
Post by Andrew Lutomirski
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
I think that part of the problem is that there are too many
implementations of captive portal detection and too many
half-thought-out implementations of what do do if a captive portal is
detected.
I think that, on a well-functioning system, if I connect to a wireless
network, something should detect if I'm behind a captive portal. If
so, I should get a stateless browser that clearly indicates that it's
a captive portal browser, probably lives in a sandbox, and sees the
raw view of the network (no local DNSSEC validation). We have network
namespaces -- the browser part is doable even in a scenario where we
wouldn't want to expose the incorrect view of DNS or some other aspect
of the network to normal applications. (Heck, on a configuration
where we want to use a VPN over untrusted wireless, we could avoid
exposing the untrusted wireless network to applications other than
captive portal login at all.)
Please note that the current GNOME captive portal mechanism is
blatantly insecure, and I've already filed a bug report with no
resolution. I'm not disclosing a subtle 0-day here -- the insecurity
is fairly obvious. I'll probably post to oss-security soon, but
that's a somewhat separate topic.
Once we determine that there's no captive portal or that we've logged
in to it, we should validate DNSSEC and otherwise behave sensibly. If
the network is screwed up enough that normal DNSSEC can't get through
the DHCP-provided resolver (which happens -- I've seen ISPs that
tamper with DNS results for www.google.com), then we should tunnel
around it. IIRC dnssec-triggerd already supports this.
So it sounds like there are two levels here:

1) connectivity detection and hotspot login using the network-provided
DNS servers, which are quite possibly insecure and/or broken

2) once that is all done, handling DNSSEC issues if the network-provided
DNS servers are insecure/broken.

Which is fine; I'm mostly concerned with #1 at this point because I
don't think NetworkManager has much to do with #2 since it already has
mechanisms to push the network's DNS servers to whatever wants it
(unbound, etc).
Post by Andrew Lutomirski
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator. There is nothing
wrong with having DNSSEC enabled and part of the portal detection
scheme, but the UI handling portals is clearly a desktop-specific
decision.
This hasn't worked so well in the past. Back when NM provided its own
UI, that UI tended to work. These days I frequently notice the
gnome-shell UI for networking, bluetooth, etc missing features that
are supported by the backend or just straight-up not working.
Post by Dan Williams
So whatever we need to do in NM to enable the workflow that
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
2. NM updates DNS information
Updates what information to what? resolv.conf should be more or less invariant.
I was unclear. Here I mean "do whatever the config files say to do"
which is either write resolv.conf, not touch resolv.conf at all
(dns=none), or send DNS to something else (dns=unbound or dns=dnsmasq).
Post by Andrew Lutomirski
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
I smell a turf war, sadly.
I don't think that "local unbound using dnssec-triggerd isn't NM"
should be a show-stopper. I'd like the see the result work correctly,
and, honestly, if gnome-shell isn't part of the solution, then that's
probably a good thing. Clearly it should be integrated enough with NM
I think the *UI* for connectivity indication and portal login certainly
is a desktop specific task, because it's part of the network connection
user experience. Whether that's GNOME or KDE or LXDE or whatever
doesn't matter, but all those environments have preferred ways of
interacting with network connections and portal login will need to be
part of that.

The backend that actually does the connectivity checking (to detect
portals or whatever) doesn't need to be part of the UI workflow, and
could certainly be provided by a small, simple service that does this on
request of things like NM or dnssec-trigger or GNOME Shell or KDE or
whatever. If that's the path forward, I have thought to contribute
there as well.
Post by Andrew Lutomirski
to be aware of connectivity changes, and it should probably integrate
more deeply than that, but if the end solution bypasses NM's captive
portal detection entirely, that doesn't seem like an a priori problem
to me.
I honestly don't care much where the checking happens, but NM has
connectivity indication as part of it's API and we have to keep that for
a while, whether or not the implementation changes. But there are other
considerations about what does the checking as it relates to NM, because
there are many consumers of it and not all of them will install unbound
or dnssec-trigger.
Post by Andrew Lutomirski
Merging all of dnssec-triggerd into NM might be a decent solution.
Honestly I'd rather see a very small, single-purpose daemon (say,
'connectivityd' or whatever that has one simple job, to do the checking
upon request of NM/UI/whatever. DNSSEC failures would get handled
separately; for example, could more of the failure-type logic be pushed
to unbound instead, since it's the resolver here?
Post by Andrew Lutomirski
Keep in mind that a good DNSSEC solution is fundamentally tied to
captive portal login. There are probably many captive portals behind
which it's impossible to get DNSSEC before login but where, once
logged in, tunneled or even normal DNSSEC is possible.
Right; to me this says that we simply don't tell the rest of the system
that we're connected until we've gone through portal login. But
whatever does connectivity checking and whatever UI does portal login
would have access to the network-provided DNS servers (untrusted of
course) to complete their jobs, and finally when it's indicated that
we're logged in do apps start using the network DNS servers. Or if
those are broken, tunneled servers or something.

Dan
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproje
Andrew Lutomirski
2015-06-12 18:19:06 UTC
Permalink
Post by Dan Williams
Post by Andrew Lutomirski
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
I think that part of the problem is that there are too many
implementations of captive portal detection and too many
half-thought-out implementations of what do do if a captive portal is
detected.
I think that, on a well-functioning system, if I connect to a wireless
network, something should detect if I'm behind a captive portal. If
so, I should get a stateless browser that clearly indicates that it's
a captive portal browser, probably lives in a sandbox, and sees the
raw view of the network (no local DNSSEC validation). We have network
namespaces -- the browser part is doable even in a scenario where we
wouldn't want to expose the incorrect view of DNS or some other aspect
of the network to normal applications. (Heck, on a configuration
where we want to use a VPN over untrusted wireless, we could avoid
exposing the untrusted wireless network to applications other than
captive portal login at all.)
Please note that the current GNOME captive portal mechanism is
blatantly insecure, and I've already filed a bug report with no
resolution. I'm not disclosing a subtle 0-day here -- the insecurity
is fairly obvious. I'll probably post to oss-security soon, but
that's a somewhat separate topic.
Once we determine that there's no captive portal or that we've logged
in to it, we should validate DNSSEC and otherwise behave sensibly. If
the network is screwed up enough that normal DNSSEC can't get through
the DHCP-provided resolver (which happens -- I've seen ISPs that
tamper with DNS results for www.google.com), then we should tunnel
around it. IIRC dnssec-triggerd already supports this.
1) connectivity detection and hotspot login using the network-provided
DNS servers, which are quite possibly insecure and/or broken
2) once that is all done, handling DNSSEC issues if the network-provided
DNS servers are insecure/broken.
Which is fine; I'm mostly concerned with #1 at this point because I
don't think NetworkManager has much to do with #2 since it already has
mechanisms to push the network's DNS servers to whatever wants it
(unbound, etc).
Fair enough.

To me, it seems like the awkward interaction is that we sort of have a
layering violation. If we think that NM and hotspot login's job is to
detect a captive portal and possibly enable a UI to log in and the DNS
resolver's job is to never give insecure results, then we need some
way to let the portal login UI function without interacting with the
DNSSEC-validating DNS server. This might need either special glibc
support or some kind of container that can override /etc/resolv.conf
just for the purpose of captive portal login.

If the ultimate solution ends up involving namespaces, I'd be more
than happy to help.
Post by Dan Williams
I think the *UI* for connectivity indication and portal login certainly
is a desktop specific task, because it's part of the network connection
user experience. Whether that's GNOME or KDE or LXDE or whatever
doesn't matter, but all those environments have preferred ways of
interacting with network connections and portal login will need to be
part of that.
The backend that actually does the connectivity checking (to detect
portals or whatever) doesn't need to be part of the UI workflow, and
could certainly be provided by a small, simple service that does this on
request of things like NM or dnssec-trigger or GNOME Shell or KDE or
whatever. If that's the path forward, I have thought to contribute
there as well.
All of this seems reasonable.

I'm currently very unhappy with gnome-shell's UI for this, but that's
not a legitimate reason for me to say that some other project should
take over the UI part.
Post by Dan Williams
Post by Andrew Lutomirski
to be aware of connectivity changes, and it should probably integrate
more deeply than that, but if the end solution bypasses NM's captive
portal detection entirely, that doesn't seem like an a priori problem
to me.
I honestly don't care much where the checking happens, but NM has
connectivity indication as part of it's API and we have to keep that for
a while, whether or not the implementation changes. But there are other
considerations about what does the checking as it relates to NM, because
there are many consumers of it and not all of them will install unbound
or dnssec-trigger.
Post by Andrew Lutomirski
Merging all of dnssec-triggerd into NM might be a decent solution.
Honestly I'd rather see a very small, single-purpose daemon (say,
'connectivityd' or whatever that has one simple job, to do the checking
upon request of NM/UI/whatever. DNSSEC failures would get handled
separately; for example, could more of the failure-type logic be pushed
to unbound instead, since it's the resolver here?
Post by Andrew Lutomirski
Keep in mind that a good DNSSEC solution is fundamentally tied to
captive portal login. There are probably many captive portals behind
which it's impossible to get DNSSEC before login but where, once
logged in, tunneled or even normal DNSSEC is possible.
Right; to me this says that we simply don't tell the rest of the system
that we're connected until we've gone through portal login. But
whatever does connectivity checking and whatever UI does portal login
would have access to the network-provided DNS servers (untrusted of
course) to complete their jobs, and finally when it's indicated that
we're logged in do apps start using the network DNS servers. Or if
those are broken, tunneled servers or something.
All that makes sense. Thanks.

FWIW, I think that a little C program to spin up a namespace that's
good enough to point a stateless Firefox instance at a captive portal
login with overridden DNS nameserver settings would only be a couple
of hundred lines of code. It could even accept a netns to use as part
of its input. The only hard part would be convincing Firefox to show
an appropriate UI.

It wouldn't really have to be Firefox, but getting the browser chrome
right to avoid trivial phishing attacks is critical, and all real
browsers already do that fairly well, whereas the simple embedded web
views (e.g. gnome-shell-portal-helper) get it nearly 100% wrong.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of
Paul Wouters
2015-06-12 18:39:56 UTC
Permalink
Post by Andrew Lutomirski
All that makes sense. Thanks.
FWIW, I think that a little C program to spin up a namespace that's
good enough to point a stateless Firefox instance at a captive portal
login with overridden DNS nameserver settings would only be a couple
of hundred lines of code. It could even accept a netns to use as part
of its input. The only hard part would be convincing Firefox to show
an appropriate UI.
It wouldn't really have to be Firefox, but getting the browser chrome
right to avoid trivial phishing attacks is critical, and all real
browsers already do that fairly well, whereas the simple embedded web
views (e.g. gnome-shell-portal-helper) get it nearly 100% wrong.
dnssec-triggerd can be configured which application to give the url to
for hotspot login. Currently:

login-command: "xdg-open"

If you write that little C program, I will test it as replacement for
xdg-open (which attimes does fail to appear for me, but usually I have
firefox open already so I create a new tab and hit 1.2.3.4)

We could ship it as part of dnssec-trigger or another package.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-c
Michael Catanzaro
2015-06-12 22:32:25 UTC
Permalink
Post by Andrew Lutomirski
It wouldn't really have to be Firefox, but getting the browser chrome
right to avoid trivial phishing attacks is critical, and all real
browsers already do that fairly well, whereas the simple embedded web
views (e.g. gnome-shell-portal-helper) get it nearly 100% wrong.
Hi, it sounds like we have a problem to fix in gnome-shell-portal
-helper. What specifically are your requirements for the browser
chrome? I figure as long as the window title is something along the
lines of "Connect to wireless network" and the hotspot can't change
that, then we should be good? We could also put a short explanation of
what is going on in a GtkInfoBar to make it really stand out. I guess
the goal is to make the chrome distinctive enough that a user stops to
think "something is not right, don't enter password" when the captive
portal helper appears and displays google.com.

FWIW the tech used for GNOME apps that need a web view is WebKitGTK+.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: h
Andrew Lutomirski
2015-06-12 22:49:23 UTC
Permalink
Post by Michael Catanzaro
Post by Andrew Lutomirski
It wouldn't really have to be Firefox, but getting the browser chrome
right to avoid trivial phishing attacks is critical, and all real
browsers already do that fairly well, whereas the simple embedded web
views (e.g. gnome-shell-portal-helper) get it nearly 100% wrong.
Hi, it sounds like we have a problem to fix in gnome-shell-portal
-helper. What specifically are your requirements for the browser
chrome? I figure as long as the window title is something along the
lines of "Connect to wireless network" and the hotspot can't change
that, then we should be good?
Barely. GNOME seems to do its best to hide window titles, so
something like a URL bar is probably a better bet. Also, users are
already (hopefully) trained to look for an indication in the URL bar
that something is secure or insecure.
Post by Michael Catanzaro
We could also put a short explanation of
what is going on in a GtkInfoBar to make it really stand out. I guess
the goal is to make the chrome distinctive enough that a user stops to
think "something is not right, don't enter password" when the captive
portal helper appears and displays google.com.
But that's not even right. Suppose you have a captive portal that
wants you to log in via your Google account. It can send you do
https://accounts.google.com, and your browser can verify the
certificate and show you an indication that the connection is secure.
Then you really can safely enter your password.

With the current gnome-shell-portal-helper, there is no chrome at all,
which means that the captive portal gets to show its own chrome, and
it could, for example, make the login window look exactly like
Firefox. I bet that even the most sophisticated users lose in that
case.

I think the UI should look like a real browser except that it should
clearly indicate that it's a "Log in to wireless network" browser in
addition to showing a standard URL bar.

https://bugzilla.gnome.org/show_bug.cgi?id=749197
Post by Michael Catanzaro
FWIW the tech used for GNOME apps that need a web view is WebKitGTK+.
Can that provide real chrome?

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code
Michael Catanzaro
2015-06-13 11:28:26 UTC
Permalink
Post by Andrew Lutomirski
But that's not even right. Suppose you have a captive portal that
wants you to log in via your Google account. It can send you do
https://accounts.google.com, and your browser can verify the
certificate and show you an indication that the connection is secure.
Then you really can safely enter your password.
Hmmm, I didn't realize legitimate portals might take you to the public
Internet. It'd be nice to not show
http://www.gnome.org (the test URL we load, expecting to be hijacked)
if the portal decides not to redirect you to a new URI (not sure how
common that is), but I think we will have to or we can't fix this....
Post by Andrew Lutomirski
I think the UI should look like a real browser except that it should
clearly indicate that it's a "Log in to wireless network" browser in
addition to showing a standard URL bar.
https://bugzilla.gnome.org/show_bug.cgi?id=749197
Can you please CC me on that bug? I didn't know GNOME Bugzilla even had
private bugs. :D
Post by Andrew Lutomirski
Post by Michael Catanzaro
FWIW the tech used for GNOME apps that need a web view is
WebKitGTK+.
Can that provide real chrome?
The web view is a GtkWidget: you pack it like any other GtkWidget into
your hierarchy, and put your own chrome around it. In this case, a URL
bar would not make any sense since we don't want the user changing the
URL; we'll probably want to display an unmodifiable URL alongside a
security indicator.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedora
Andrew Lutomirski
2015-06-13 16:45:59 UTC
Permalink
Post by Michael Catanzaro
Post by Andrew Lutomirski
But that's not even right. Suppose you have a captive portal that
wants you to log in via your Google account. It can send you do
https://accounts.google.com, and your browser can verify the
certificate and show you an indication that the connection is secure.
Then you really can safely enter your password.
Hmmm, I didn't realize legitimate portals might take you to the public
Internet.
I think I've seen this in airports and in some hotel chains.
Post by Michael Catanzaro
It'd be nice to not show
http://www.gnome.org (the test URL we load, expecting to be hijacked)
if the portal decides not to redirect you to a new URI (not sure how
common that is), but I think we will have to or we can't fix this....
It could be http://generic-network-login.org or something like that.
Post by Michael Catanzaro
Post by Andrew Lutomirski
I think the UI should look like a real browser except that it should
clearly indicate that it's a "Log in to wireless network" browser in
addition to showing a standard URL bar.
https://bugzilla.gnome.org/show_bug.cgi?id=749197
Can you please CC me on that bug? I didn't know GNOME Bugzilla even had
private bugs. :D
Done. I don't think I'm the one who made it private.
Post by Michael Catanzaro
Post by Andrew Lutomirski
Post by Michael Catanzaro
FWIW the tech used for GNOME apps that need a web view is
WebKitGTK+.
Can that provide real chrome?
The web view is a GtkWidget: you pack it like any other GtkWidget into
your hierarchy, and put your own chrome around it. In this case, a URL
bar would not make any sense since we don't want the user changing the
URL; we'll probably want to display an unmodifiable URL alongside a
security indicator.
I guess the reason to keep it read-only is to prevent people from using it
like a real browser.

--Andy
Post by Michael Catanzaro
--
devel mailing list
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Paul Wouters
2015-06-13 18:36:24 UTC
Permalink
Post by Andrew Lutomirski
Post by Michael Catanzaro
It'd be nice to not show
http://www.gnome.org (the test URL we load, expecting to be hijacked)
if the portal decides not to redirect you to a new URI (not sure how
common that is), but I think we will have to or we can't fix this....
It could be http://generic-network-login.org or something like that.
using www.gnome.org is wrong. For one, you cannot guarantee they won't
end up using some redirect and than the captive portal would fail.
Second, the TTL for that DNS entry is not 0, so it will get cached and
cause wrong probe results later on.

There is a good reason we started hotspot-nocache.fedoraproject.org.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-
Michael Catanzaro
2015-06-13 19:01:07 UTC
Permalink
using www.gnome.org is wrong. For one, you cannot guarantee they
won't
end up using some redirect and than the captive portal would fail.
I don't get it: what is wrong, what would fail? We expect them to
replace the contents of
www.gnome.org with either their own content, or else a redirect
someplace else.
Second, the TTL for that DNS entry is not 0, so it will get cached and
cause wrong probe results later on.
There is a good reason we started hotspot-nocache.fedoraproject.org.
Hm... the captive portal helper loads www.gnome.org but it only runs
after NetworkManager has decided there is a captive portal. We can make
this URL configurable at build time if there's really a problem, but
I'm not sure there is, since it's not used for NetworkManager's
connectivity check (which is what triggers us to start the captive
portal helper, and what decides that we have full Internet access and
closes it). For the connectivity check, NetworkManager uses
https://fedoraproject.org/static/hotspot.txt defined in
/etc/NetworkManager/conf.d/20-connectivity-fedora.conf. So... I guess
that is not good, and we should switch that to use hotspot
-nocache.fedoraproject.org instead?
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-cond
Reindl Harald
2015-06-13 19:04:16 UTC
Permalink
Post by Michael Catanzaro
Post by Paul Wouters
There is a good reason we started hotspot-nocache.fedoraproject.org.
Hm... the captive portal helper loads www.gnome.org but it only runs
after NetworkManager has decided there is a captive portal. We can make
this URL configurable at build time if there's really a problem, but
I'm not sure there is, since it's not used for NetworkManager's
connectivity check (which is what triggers us to start the captive
portal helper, and what decides that we have full Internet access and
closes it). For the connectivity check, NetworkManager uses
https://fedoraproject.org/static/hotspot.txt defined in
/etc/NetworkManager/conf.d/20-connectivity-fedora.conf. So... I guess
that is not good, and we should switch that to use hotspot
-nocache.fedoraproject.org instead?
surely

you moust not use cached results for on-demand connectivity checks
Paul Wouters
2015-06-13 19:54:50 UTC
Permalink
Post by Michael Catanzaro
Hm... the captive portal helper loads www.gnome.org but it only runs
after NetworkManager has decided there is a captive portal. We can make
this URL configurable at build time if there's really a problem, but
I'm not sure there is, since it's not used for NetworkManager's
connectivity check (which is what triggers us to start the captive
portal helper, and what decides that we have full Internet access and
closes it). For the connectivity check, NetworkManager uses
https://fedoraproject.org/static/hotspot.txt defined in
/etc/NetworkManager/conf.d/20-connectivity-fedora.conf. So... I guess
that is not good, and we should switch that to use hotspot
-nocache.fedoraproject.org instead?
If the captive portal uses the system's DNS, and the system has cached
www.gnome.org from when you were on a previous network, your captive
portal check might use a cached DNS resolve and try to use an HTTP
connection to a blocked IP address, because the local forged DNS answer
to the local hotspot IP never got triggered. So if you use www.gnome.org,
you have to make sure the portal software is not using the system DNS cache
for DNS lookups. So it is better for captive portal login to use
hotspot-nocache.fedoraproject.org, which will always have a TTL of 0,
so it will not cached.

For detecting whether or not you are hotspotted, the decision to say
it is a hotspot is based on "DNS inteception or HTTP interception", so
using https://fedoraproject.org/static/hotspot.txt is fine, as it is
guaranteed to never use any kind of redirects and will always just
return a page stating "OK". Anythign else means hotspot (or attack :)
In this case, DNS caching won't matter because this part is only used
for the HTTP interception test. The DNS interception test (at least
with dnssec-trigger) queries the root zone and a handful of TLD queries,
and does not use DNS queries for fedoraproject.org.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedo
Michael Catanzaro
2015-06-13 23:10:59 UTC
Permalink
Post by Paul Wouters
If the captive portal uses the system's DNS, and the system has cached
www.gnome.org from when you were on a previous network, your captive
portal check might use a cached DNS resolve and try to use an HTTP
connection to a blocked IP address, because the local forged DNS answer
to the local hotspot IP never got triggered.
Thanks. I am still trying to understand this fully. I assumed the
portal would hijack TCP connections, but if the portal uses DNS
hijacking only and does not hijack TCP connections to the real www.gnome.org, and we attempt to open a TCP session to the real
www.gnome.org,
and the portal is only expecting us to visit a different host due to
its DNS hijacking, then I understand that we're out of luck and the
portal's login page will never show. OK, I've followed that far.

There is one thing I don't understand. Surely the above is exactly what
will happen if you were to get stuck behind a captive portal with
Firefox or any normal browser? But portals still work reliably for
users. So either the browsers are doing a connectivity test similar to
what you described (to a host with a DNS TTL of 0) and we have to do it
too, or the portals are prepared to hijack TCP connections and not just
DNS and we have no problem, or the portals just don't work reliably for
browsers and portal-helper is an opportunity to fix that. Right...?

Anyway, once I understand this properly, I will file a bug upstream (or
if you have a GNOME Bugzilla account, it would be better if you do so,
to be CCed on responses). Thanks for catching this issue.

Michael
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of
Paul Wouters
2015-06-12 04:48:33 UTC
Permalink
Post by Dan Williams
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
dnssec-trigger prompts the user with a choice of "allow insecure DNS" or
"cache only mode". The latter means "no new DNS and use what's already
in the cache only".
Post by Dan Williams
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
That is why HTTP redirection and DNS failure have to be detected by
whatever is the "hot spot detector". Both items weigh in on triggering
a hotspot logon window.
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator.
Everyone is in agreement here I believe. No one particularly likes the
dnssec-trigger ui. It was written as an desktop agnostic tool - for
instance it works on Windows and OSX. I'd love to see this better
integrated into gnome.
Post by Dan Williams
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
2. NM updates DNS information
I don't know what 2) means. If it means rewriting /etc/resolv.conf or
the unbound forwarder configuration, we have already lost if the DNS
was malicious (and/or a hotspot DNS)
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
dnssec-trigger currently detects the difference by also checking for an
http hotspot redirect using http://fedoraproject.org/static/hotspot.txt
If no http redirect, then DNS is broken and it tries to work around it
by becoming a full iterative resolver or doing DNS over TCP or DNS over
TLS. and if it all fails, presents the "insecure or cache only" dialog.

But, if I could have my "ideal scenario", things would be a little
different:

1) NM detects a new nework, but doesn't tell the applications that there
is network connectivity yet. So firefox won't throw HTTPS warnings
and pidgin/IM won't throw https warnings. Because as far as they know
the network is still down.

2) NM/dnssec-trigger does the HTTP and DNS probing and prompting using
a dedicated container and any DNS requests in that container are
thrown away with the container once hotspot has been authenticated.
This would allow us to never have resolv.conf on the host be
different from 127.0.0.1. (currently, it needs to put in the hotspot
DNS servers for the hotspot logon, exposing other applications to
fake DNS)

3) dnssec-trigger updates the unbound DNS configurtion and tells NM to
proceed. NM tells the applications there is new network connectivity.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproje
Dan Williams
2015-06-12 17:17:40 UTC
Permalink
Post by Paul Wouters
Post by Dan Williams
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
dnssec-trigger prompts the user with a choice of "allow insecure DNS" or
"cache only mode". The latter means "no new DNS and use what's already
in the cache only".
Yeah, and the interaction story here has been controversial for a long
time. The GNOME team certainly has ideas about how it should work,
which are partly shown by the current hotspot/portal implementation in
GNOME Shell. I'll let them discuss these ideas since NM is not involved
in the higher-level UI story here, just the mechanics of providing
"might this be a portal" to any NM client, GNOME Shell included.
Post by Paul Wouters
Post by Dan Williams
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
That is why HTTP redirection and DNS failure have to be detected by
whatever is the "hot spot detector". Both items weigh in on triggering
a hotspot logon window.
Agreed. But how does the DNS failure actually get relayed to the thing
doing the HTTP request, when unbound + DNSSEC is involved? That's one
point I'm very unclear on.
Post by Paul Wouters
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator.
Everyone is in agreement here I believe. No one particularly likes the
dnssec-trigger ui. It was written as an desktop agnostic tool - for
instance it works on Windows and OSX. I'd love to see this better
integrated into gnome.
Post by Dan Williams
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
2. NM updates DNS information
I don't know what 2) means. If it means rewriting /etc/resolv.conf or
the unbound forwarder configuration, we have already lost if the DNS
was malicious (and/or a hotspot DNS)
It means whatever "dns" action was set in NM, either writing
resolv.conf, not touching anything (dns=none), sending split DNS to
unbound (dns=unbound), or to dnsmasq (dns=dnsmasq), etc. In this case
I'll presume dns=unbound.
Post by Paul Wouters
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
dnssec-trigger currently detects the difference by also checking for an
http hotspot redirect using http://fedoraproject.org/static/hotspot.txt
If no http redirect, then DNS is broken and it tries to work around it
by becoming a full iterative resolver or doing DNS over TCP or DNS over
TLS. and if it all fails, presents the "insecure or cache only" dialog.
NM also checks for redirection.

Though, what do you mean by "if no HTTP redirect, then DNS is broken"?
Do you mean to prefix that with "If the correct response is not
recevied..."?
Post by Paul Wouters
But, if I could have my "ideal scenario", things would be a little
1) NM detects a new nework, but doesn't tell the applications that there
is network connectivity yet. So firefox won't throw HTTPS warnings
and pidgin/IM won't throw https warnings. Because as far as they know
the network is still down.
Agreed. Right now we have "connectivity" states, but they are all
determined after the interface is signaled as "connected". We can do
some work here to indicate connectivity status on this interface before
indicating to applications that the interface is fully connected.
Post by Paul Wouters
2) NM/dnssec-trigger does the HTTP and DNS probing and prompting using
a dedicated container and any DNS requests in that container are
thrown away with the container once hotspot has been authenticated.
This would allow us to never have resolv.conf on the host be
different from 127.0.0.1. (currently, it needs to put in the hotspot
DNS servers for the hotspot logon, exposing other applications to
fake DNS)
I'm not sure a container really needs to be involved as long as the DNS
resolution can be done without hitting resolv.conf. That's not hugely
hard to do I think as long as we can manually resolve the connectivity
URI address without telling applications about the new DNS servers.

Once we've determined that indeed we are on a hotspot, then we need to
indicate that to the UI such that it can show the user a logon window
with the hotspot's login page, or a username/password dialog (if WISPR
or Hotspot 2.0 is involved) that uses private DNS servers too. At this
point we're still untrusted.

Then once the hotspot login is completed, we must re-do the connectivity
check to ensure that we do indeed have access to the full internet. If
we do, then we can finally signal "connected". If it fails again, then
we either show the hotspot login window again, or somehow indicate that
hotspot login failed.

Note that none of this mentions DNS to the user at all yet... so what
happens if the hotspot login succeeds, we get connectivity to the
internet, but the hotspot DNS doesn't support DNSSEC correctly?

Dan
Post by Paul Wouters
3) dnssec-trigger updates the unbound DNS configurtion and tells NM to
proceed. NM tells the applications there is new network connectivity.
Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct:
Michael Catanzaro
2015-06-12 17:57:30 UTC
Permalink
Post by Dan Williams
Post by Paul Wouters
dnssec-trigger prompts the user with a choice of "allow insecure DNS" or
"cache only mode". The latter means "no new DNS and use what's already
in the cache only".
Yeah, and the interaction story here has been controversial for a long
time. The GNOME team certainly has ideas about how it should work,
which are partly shown by the current hotspot/portal implementation in
GNOME Shell. I'll let them discuss these ideas since NM is not involved
in the higher-level UI story here, just the mechanics of providing
"might this be a portal" to any NM client, GNOME Shell included.
Hi. In general, prompts along the lines of "do insecure thing [yes]
[no]" are a big no-no. You should either always do the insecure thing
(if it really must be allowed) or never do the insecure thing
(preferably), but prompting the user to make a confusing security
decision is not OK.

In this case I assume always failing the connection is the right thing
to do, as to do otherwise would defeat the purpose of this feature. If
we could automatically display some very basic troubleshooting steps
("call your ISP and tell them xyz"), that would be good too. But I
presume it's unlikely that every workaround will fail and the user is
stuck without DNS? Hopefully that would be rare. If it's not and the
user really must be given a choice to allow insecure DNS, then maybe
the world just isn't ready for DNSSEC yet....
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.
Andrew Lutomirski
2015-06-12 18:09:38 UTC
Permalink
Post by Dan Williams
Post by Paul Wouters
2) NM/dnssec-trigger does the HTTP and DNS probing and prompting using
a dedicated container and any DNS requests in that container are
thrown away with the container once hotspot has been authenticated.
This would allow us to never have resolv.conf on the host be
different from 127.0.0.1. (currently, it needs to put in the hotspot
DNS servers for the hotspot logon, exposing other applications to
fake DNS)
I'm not sure a container really needs to be involved as long as the DNS
resolution can be done without hitting resolv.conf. That's not hugely
hard to do I think as long as we can manually resolve the connectivity
URI address without telling applications about the new DNS servers.
If you have automatic VPN connection enabled, then I don't really see
how a captive portal login can be done fully safely without a
container -- the captive portal login should see a route or even
interface that should never be visible to anything else.

--Andy
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-condu
Paul Wouters
2015-06-12 18:32:54 UTC
Permalink
Post by Dan Williams
Post by Paul Wouters
That is why HTTP redirection and DNS failure have to be detected by
whatever is the "hot spot detector". Both items weigh in on triggering
a hotspot logon window.
Agreed. But how does the DNS failure actually get relayed to the thing
doing the HTTP request, when unbound + DNSSEC is involved? That's one
point I'm very unclear on.
In hotspot mode (dnssec-trigger's version of hotspot mode)
/etc/resolv.conf contains the DHCP supplied DNS servers. Those are used
to determine both the "DNS cleanliness" state, and is also used to fetch
the fedoraproject hot spot detection page. The unbound DNS server, while
running, is not used at all for anything, as resolv.conf does not point
to it. Unfortunately, because this is not isolated to dnssec-triggerd,
all applications doing DNS during this time get crap/dangerous DNS
resolves, leading to add the bad certificate warning popups. And why I
was hoping to isolate that with either a network namespace, or other
solution that prevents us from requiring to affect the whole system
by changing resolv.conf.

If selecting "cache only", then resolv.conf points to 127.0.0.1 and
unbound is configured with a "DNS forwarder" for everything set to
127.0.0.127 so no DNS lookups ever leave the host.
Post by Dan Williams
Post by Paul Wouters
Post by Dan Williams
1. NM connects to a new network
2. NM updates DNS information
I don't know what 2) means. If it means rewriting /etc/resolv.conf or
the unbound forwarder configuration, we have already lost if the DNS
was malicious (and/or a hotspot DNS)
It means whatever "dns" action was set in NM, either writing
resolv.conf, not touching anything (dns=none), sending split DNS to
unbound (dns=unbound), or to dnsmasq (dns=dnsmasq), etc. In this case
I'll presume dns=unbound.
Ahh thanks.
Post by Dan Williams
Post by Paul Wouters
dnssec-trigger currently detects the difference by also checking for an
http hotspot redirect using http://fedoraproject.org/static/hotspot.txt
If no http redirect, then DNS is broken and it tries to work around it
by becoming a full iterative resolver or doing DNS over TCP or DNS over
TLS. and if it all fails, presents the "insecure or cache only" dialog.
NM also checks for redirection.
Though, what do you mean by "if no HTTP redirect, then DNS is broken"?
Sorry I meant "If no http redirect, and DNS is broken, then it tries to
work around by ...". That is, when there is an http redirect, there is
no point doing anything about DNS because after authenticating to the
hotspot, DNS might turn out to be either fine or broken for other
reasons.
Post by Dan Williams
Post by Paul Wouters
1) NM detects a new nework, but doesn't tell the applications that there
is network connectivity yet. So firefox won't throw HTTPS warnings
and pidgin/IM won't throw https warnings. Because as far as they know
the network is still down.
Agreed. Right now we have "connectivity" states, but they are all
determined after the interface is signaled as "connected". We can do
some work here to indicate connectivity status on this interface before
indicating to applications that the interface is fully connected.
That would be awesome!
Post by Dan Williams
Post by Paul Wouters
2) NM/dnssec-trigger does the HTTP and DNS probing and prompting using
a dedicated container and any DNS requests in that container are
thrown away with the container once hotspot has been authenticated.
This would allow us to never have resolv.conf on the host be
different from 127.0.0.1. (currently, it needs to put in the hotspot
DNS servers for the hotspot logon, exposing other applications to
fake DNS)
I'm not sure a container really needs to be involved as long as the DNS
resolution can be done without hitting resolv.conf. That's not hugely
hard to do I think
True. In fact with unbound it is pretty trivial to do. The equivalent
unbound python code for that would be:

import unbound

ctx = unbound.ub_ctx()
ctx.resolvconf("/this/networks/respresentation/of/resolv.conf")

any resolve calls made will use the non-system resolv.conf's nameserver
addresses.

So the hotspot check could be:

ctx = unbound.ub_ctx()
ctx.add_ta_file(rootanchor) # DNSSEC root key
ctx.resolvconf("/this/networks/respresentation/of/resolv.conf")
status, result = ctx.resolve("fedoraproject.org", unbound.RR_TYPE_A)
if not result.havedata or not result.secure:
# we're captive because fedoraproject.org is DNSSEC signed and
# we got an error (forged) response
# Redo query with a non-DNSSEC cache to get forged A record to
# authenticate to the hotspot
insecurectx = unbound.ub_ctx()
insecurectx.resolvconf("/this/networks/respresentation/of/resolv.conf")
status, result = insecurectx.resolve("fedoraproject.org", unbound.RR_TYPE_A)
if result.havedata:
addr = result.data.address_list[0]
# give addr to the captive portal logon HTTP engine
insecurectx.ub_close()
else:
if result.havedata:
# check for HTTP interception - we might still be captive
addr = result.data.address_list[0]
# give addr to the captive portal logon HTTP engine
ctx.ub_close()

Things are a little tricker because the hotspot likely stupidly uses
even more DNS calls to build up the logon page, so whatever the http
rendering agent is (eg xdg-open or firefox or whatever) needs to keep
using this unbound cache and not fall back to the system default one.
Post by Dan Williams
Then once the hotspot login is completed, we must re-do the connectivity
check to ensure that we do indeed have access to the full internet. If
we do, then we can finally signal "connected". If it fails again, then
we either show the hotspot login window again, or somehow indicate that
hotspot login failed.
Note that none of this mentions DNS to the user at all yet... so what
happens if the hotspot login succeeds, we get connectivity to the
internet, but the hotspot DNS doesn't support DNSSEC correctly?
If HTTP is no longer redirected (dnssec-trigger keeps probing while you
pull your credit card out), it assumes you have successfully authenticated
to the hotspot. It re-tests the supplied DNS servers. If these are still
determined to be too broken for using DNSSEC (eg too old bind,
dnsmasq) it tries to (silently) become a full itterative nameservers,
eg it will not use any forwards and do all the DNS work itself. If this
also fails, for example because the network blocks port 53 to all but
its own DNS servers, dnssec-trigger tries the other modes of DNS over
TCP/SSL. If any of this works the user isn't even consulted. Only when
all of this fails do we need to contact the user and ask them to go
"insecure" or "cache only"

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject
Tomas Hozza
2015-06-12 08:58:14 UTC
Permalink
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
NetworkManager is pure network configuration manager in this scenario.
We don't expect nor want NM to handle /etc/resolv.conf. We will only get
the current network configuration from it and act upon it. NM
configuration will contain "dns=unbound".

The cases when local (to the network you are connected to) DNS resolver
does not support DNSSEC is handled by the logic in dnssec-trigger and
dnssec-trigger script. Unbound is always configured in a way that it is
able to do DNS resolution and DNSSEC validation. If this can not be
done, the user is informed.
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
If there is such situation, that Unbound fails all DNS lookups, then it
is a bug. This is pure theory until you have some real situation. The
logic is designed in a way to prevent such situations from ever happen.
Hotspot detection is done by dnssec-trigger. The hot-spot-signon is done
by putting DHCP provided resolvers into resolv.conf. So in this
situation Unbound it not used at all.
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator. There is nothing
wrong with having DNSSEC enabled and part of the portal detection
scheme, but the UI handling portals is clearly a desktop-specific
decision. So whatever we need to do in NM to enable the workflow that
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
1.1. Dispatch dispatcher with the network configuration change event.
Post by Dan Williams
2. NM updates DNS information
NM is not expected to touch resolv.conf in the intended default
configuration.
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
If you think NM needs to do some action (as I don't), we don't have
problem with notifying NM (if you provide some API).
Post by Dan Williams
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
The only trusted DNS resolver is the local Unbound. The DNS resolver
from the network you are connected to is never trusted. It is just used
in case it can provide all the necessary information to do the DNSSEC
validation. Since using such data we are able to build the chain of
trust and verify that the Answer is correct, there is no point in
distinguishing if network provided resolver is trusted or not... it is
not. This is the reason we do the validation locally.
Post by Dan Williams
Dan
I would like to add that this already works without any other
interaction with NM. I agree that the notifications from dnssec-trigger
are not ideal. I'm going to contact some GNOME guys and ask them for help.

Thank you for your comments!

Regards,
--
Tomas Hozza
Software Engineer - EMEA ENG Developer Experience

PGP: 1D9F3C2D
Red Hat Inc. http://cz.redhat.com
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraprojec
Matthias Clasen
2015-06-12 13:51:37 UTC
Permalink
Post by Tomas Hozza
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
If you think NM needs to do some action (as I don't), we don't have
problem with notifying NM (if you provide some API).
This is your feature, so you are responsible for making sure that it
does not break the rest of the OS, not the other way around...

I've just installed dnssec-trigger on rawhide to try this out, and
found that it breaks networking on my Workstation. I used to get a
network connection on login, now I get a question mark in top bar, and
a status icon with obsure menu options appears. This is quite a
contrast from what the Change page says: "Users shouldn't notice this
change at all".

The OS integration of this feature is clearly not done.
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct
Paul Wouters
2015-06-12 13:57:38 UTC
Permalink
Post by Matthias Clasen
I've just installed dnssec-trigger on rawhide to try this out, and
found that it breaks networking on my Workstation. I used to get a
network connection on login, now I get a question mark in top bar, and
a status icon with obsure menu options appears.
Did your networking actually break, or just the notification icon status?
Is the unbound service running?
Is the dnssec-triggerd service running?

I have noticed that the network status icon in the top right has never
worked for me in at least a year. It sometimes says "?" when I have
proper network connectivity and sometimes shows the wifi waves when
I do not have network connectivity. I did not realise this might have
been due to dnssec-triggerd/unbound.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http:/
Matthias Clasen
2015-06-12 14:02:06 UTC
Permalink
Post by Paul Wouters
Post by Matthias Clasen
I've just installed dnssec-trigger on rawhide to try this out, and
found that it breaks networking on my Workstation. I used to get a
network connection on login, now I get a question mark in top bar, and
a status icon with obsure menu options appears.
Did your networking actually break, or just the notification icon status?
Is the unbound service running?
Is the dnssec-triggerd service running?
I have noticed that the network status icon in the top right has never
worked for me in at least a year. It sometimes says "?" when I have
proper network connectivity and sometimes shows the wifi waves when
I do not have network connectivity. I did not realise this might have
been due to dnssec-triggerd/unbound.
Maybe that is because you play too much with DNSSEC ? :-) It works
pretty reliably for me and we don't have a huge influx of 'network
status is broken' bugs which we would have if it was as broken as you
say...
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code
Matthew Miller
2015-06-12 14:17:08 UTC
Permalink
Post by Paul Wouters
Did your networking actually break, or just the notification icon status?
It will definitely break on F22 without the updated SELinux or SELinux
in permissive mode.
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org
Matthew Miller
2015-06-12 14:20:33 UTC
Permalink
Post by Tomas Hozza
NetworkManager is pure network configuration manager in this scenario.
We don't expect nor want NM to handle /etc/resolv.conf. We will only get
the current network configuration from it and act upon it. NM
configuration will contain "dns=unbound".
Another integration concern: the network config GUI (and ifcfg files,
for that matter) let me list specific DNS servers. With this
feature, are those used (and if so, how)? If not, is my configuration
just silently ignored?
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduc
Dan Williams
2015-06-12 14:55:57 UTC
Permalink
Post by Matthew Miller
Post by Tomas Hozza
NetworkManager is pure network configuration manager in this scenario.
We don't expect nor want NM to handle /etc/resolv.conf. We will only get
the current network configuration from it and act upon it. NM
configuration will contain "dns=unbound".
Another integration concern: the network config GUI (and ifcfg files,
for that matter) let me list specific DNS servers. With this
feature, are those used (and if so, how)? If not, is my configuration
just silently ignored?
NM will use those DNS servers as it always has, and with dns=unbound
will simply forward them to unbound, which will use your servers as the
upstream servers. Basically, any information that NM used to write to
resolv.conf will now instead get forwarded to unbound.

What unbound wants to do with it is another story, of course, that I'm
not an expert on but Thomas/Paul/etc are.

Dan
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: h
Paul Wouters
2015-06-12 14:58:27 UTC
Permalink
Post by Matthew Miller
Another integration concern: the network config GUI (and ifcfg files,
for that matter) let me list specific DNS servers. With this
feature, are those used (and if so, how)? If not, is my configuration
just silently ignored?
I do not know if it is supported currently, but support for that is
very trivial. If unbound is found running, issue:

unbound-control forward_add . 1.2.3.4 5.6.7.8

I'm not sure whose job that would be.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http:/
Dan Williams
2015-06-12 16:58:18 UTC
Permalink
Post by Tomas Hozza
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Unfortunately the Proposal doesn't say anything about how this will
actually work, which is something NetworkManager needs to know. It also
fails to address the failure cases where your local DNS doesn't support
DNSSEC or is otherwise broken here out of no fault of your own.
NetworkManager is pure network configuration manager in this scenario.
We don't expect nor want NM to handle /etc/resolv.conf. We will only get
the current network configuration from it and act upon it. NM
configuration will contain "dns=unbound".
Correct, and I personally have no problem with this. NM is quite happy
to hand off DNS information wherever it has been told to do so.

But this is separate from the connectivity detection/hotspot issue which
I think we'll discuss more below.
Post by Tomas Hozza
The cases when local (to the network you are connected to) DNS resolver
does not support DNSSEC is handled by the logic in dnssec-trigger and
dnssec-trigger script. Unbound is always configured in a way that it is
able to do DNS resolution and DNSSEC validation. If this can not be
done, the user is informed.
Right, and that's where most of this discussion lies, I think.
Post by Tomas Hozza
Post by Dan Williams
Post by Matthew Miller
Post by Paul Wouters
Post by Matthew Miller
I see that there's a "hotspot sign on" option if you right click on the
icon. How does this work with Network Manager and GNOME's captive
portal detection?
I have never seen those work except for when the backend was down and
I got a stream of false positives. But possibly that is because I've used
dnssec-trigger for years now and it might win the captive portal
detection race. There are some bugs once in a while but overal it works
pretty reliably.
I think that's probably it — the race. The hotspot signon thing works
for me at coffeeshops. Or it did before I enabled this feature. We'll
see now!
So, if you're behind a portal then unbound could potentially fail all
DNS lookups. That means that NetworkManager's connectivity detection,
which relies on retrieving a URL from a known website, will fail because
the DNS lookup for it was blocked by unbound. Thus GNOME Shell portal
detection will also fail. That kinda sucks.
If there is such situation, that Unbound fails all DNS lookups, then it
is a bug. This is pure theory until you have some real situation. The
logic is designed in a way to prevent such situations from ever happen.
Hotspot detection is done by dnssec-trigger. The hot-spot-signon is done
by putting DHCP provided resolvers into resolv.conf. So in this
situation Unbound it not used at all.
Post by Dan Williams
While I'm sure the dnssec-trigger panel applet works great for some
people, I think the GNOME team would rather have the portal
functionality in the existing GNOME Shell indicator. There is nothing
wrong with having DNSSEC enabled and part of the portal detection
scheme, but the UI handling portals is clearly a desktop-specific
decision. So whatever we need to do in NM to enable the workflow that
desktops need is what we'll end up doing... Ideally the process goes
1. NM connects to a new network
1.1. Dispatch dispatcher with the network configuration change event.
Post by Dan Williams
2. NM updates DNS information
NM is not expected to touch resolv.conf in the intended default
configuration.
My #2 was intended to be the same as your #1.1. I was assuming
"dns=unbound" here.
Post by Tomas Hozza
Post by Dan Williams
3. NM waits for some signal from unbound/dnssec-trigger about the
trustability of the DNS server
If you think NM needs to do some action (as I don't), we don't have
problem with notifying NM (if you provide some API).
NM may need to do some action for connectivity checking.
Post by Tomas Hozza
Post by Dan Williams
3a. if the DNS server is trusted, NM continues with its connectivity
check
3b. if the DNS server is not trusted or DNSSEC is broken, then ??? How
do we distinguish between "portal" and simply that your local DNS
doesn't support DNSSEC or is otherwise broken, if we cannot resolve the
address of the connectivity server?
The only trusted DNS resolver is the local Unbound. The DNS resolver
from the network you are connected to is never trusted. It is just used
in case it can provide all the necessary information to do the DNSSEC
validation. Since using such data we are able to build the chain of
trust and verify that the Answer is correct, there is no point in
distinguishing if network provided resolver is trusted or not... it is
not. This is the reason we do the validation locally.
Ok, I should rephrase my question to be clearer. NM's connectivity
checking (which yes, does overlap in functionality with dnssec-triggers)
will resolve a hostname and attempt to contact it. In this "untrusted"
state, will that hostname resolve to some address (either valid or
spoofed from a portal), or will NM get an error response from
gethostbyname/getaddrinfo-type calls?

Dan
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-
Paul Wouters
2015-06-12 15:26:50 UTC
Permalink
HERE we need to coordinate with other parties who might want to write into the
NetworkManager
initscripts
dhclient
libreswan ?
resolved
connman
Option is either to implement all the checks and workarounds in all the
projects over and over or to implement all the logic in one place -
dnssec-trigger might be such place.
Anyone who is going to write to resolv.conf needs to check for captive
portals, find a DNSSEC-enabled DNS server, and deal with VPN-provided DNS
servers and domains.
*Questions:*
Guys, what are your plans for handling the situations mentioned above?
libreswan will not write /etc/resolv.conf is unbound is running. Instead via
its _updown script it currently adds the forwarders to unbound and performs
the cache flush / requestlist flush. Then it signals NM (which currently then
does the same thing - the libreswan specific code can be removed for fedora/rhel
when we know the NM version is handling it)
Can we integrate on one place (e.g. by calling into dnssec-trigger) instead
overwriting /etc/resolv.conf independently?
If we could have a "hotspot logon network container", which will use its own
/etc/resolv.conf and its own disposable DNS, I think the host /etc/resolv.conf
could be an immutable 127.0.0.1 entry only.
Second problem: API for applications
====================================
(this second step is not part of the F23 feature but it is worth discussing)
Applications and crypto libraries need "an" interface to get DNS data which
are either 100 % correct or declared as not trusted. False positive (trusted)
answers are simply unacceptable because that would allow serious attacks.
Imagine that OpenSSH client is verifying server's fingerprint against the
value obtained from DNS *instead of asking the user*. If the client accepted a
fake response with faked server's fingerprint then everything is doomed.
That is partially solved with a "network logon container". Such a container also
resolves the problem of every application immediately connecting to the network
when a new network is detected, and all of them hitting the hotspot IP redirect
and hitting bad certificates. Firefox is somewhat smart about not reloading its
tabs these days, but for instance pidgin is terrible and all my XMPP/jabber
servers will throw a TLS warning on the screen.

The real fix is for these applications to never experience the "hotspot login"
state.
The proposal https://sourceware.org/ml/libc-alpha/2014-11/msg00426.html on
glibc mailing list is to extend getaddr API with flag which says 'secure
answers only'. This will return an answer only if DNSSEC validation for given
answer was successful and the answer was properly signed.
The assumption here is that something like dnssec-trigger properly configures
local resolver (using the information from DHCP + applying all the necessary
workarounds) to do DNSSEC validation locally so we are 100 % sure that the
fake answer can be detected.
The open question is how to pass the information about security status to all
the parties. The mechanism needs to be simple so other resolver libraries like
e.g. python-dns can follow the same rules and use the same logic as Glibc.
This still seems a largely unsolved/unagreed on problem we have had a lot of
discussion about.
a) We are in hot-spot sign-on mode or validating resolver is unavailable for
some reason (early boot, resource constraints, Docker container [finally!],
In this case *nothing* can be trusted. Resolver might return faked answers and
we have no means to check if declared trustworthiness is correct or not.
Again, we need to be 100 % sure from the cryptographical point of view.
=> Application MUST NOT receive any answer marked as "secure"/"trusted" if we
are in this mode.
the application shouldn't even be exposed to the world while we are in this mode.
The world isn't ready yet for them. Additionally, one could extend this concept
and say there is a "DMZ" and "internal" network on the host, and only very selected
few applications get onto the DMZ. The hotspot logon is one such app, another one
are the VPN apps. For VPN apps, one could leave only the VPN daemons exposed to the
external network, and force all applications to only see the world through the VPN.
Without a VPN, the host could decide that once hotspot sign happened, to just bridge
or move the DMZ into the internal zone.
b) Validating resolver is up, running, properly configured, and the path to
the resolver is trusted - it might be running on localhost or we are in Docker
container and we trust the host and so on.
In this case we trust to the result of validation indicated by AD bit.
Application will receive the answer marked as trusted if the resolver tells us
to do so by AD bit in the DNS reply.
Additionally, these applications could link against a better DNS API, and use
something like getdns or edns-query-chain. But I think trusting the AD bit is
a good enough solution for most cases, especially with the added mechanism of
marking certain DNS servers trusted.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject
Dan Williams
2015-06-12 16:53:32 UTC
Permalink
Post by Matthew Miller
Post by Paul Wouters
decision needs to then be made by the system. I believe that's been
mostly due to lack of time for the various parties to sit down and
plan and then program this further.
We should try to make that happen.
Okay, let's start once again from scratch.
All of this was already discussed and we even had a huge meeting around
DevConf and FLOCK 2014 about this, so following text will be just a short
Yeah, we did. From my recollection, most of that focused on the unbound
parts and how NM could add the dns=unbound stuff (which Pavel
contributed) but less on the NM connectivity checking, becuase Fedora
hadn't turned that on by default yet. I'm all fine with dns=unbound,
that's not the issue. The issue is more around what happens with NM's
connectivity checking, since that's used by quite a few clients,
including GNOME Shell.
The ultimate goal
=================
Make various man-in-the-middle attacks *automatically* detectable - without
any user interaction. Especially we want to get rid of dialogs like "Site
www.gmail.com is using certificate issued for xxx.porn and certificate's
validity ended 10 years ago. Do you want to continue? [YES] [YES] [YES]".
Tools
=====
To achieve this goal we need to do DNSSEC validation on every client machine
(ignoring Docker for a moment, see below) and allow applications to use DNS as
trusted source of sensitive data (certificate fingerprints, SSH fingerprints,
etc.).
DNSSEC allows all parties to publish their fingerprints in DNS and gives us a
secure way to get the data and to detect that someone prevents us from getting
the data.
Longer description
==================
http://developerblog.redhat.com/2015/04/14/writing-an-application-that-supports-dnssec-in-rhel-and-fedora/
First step: DNSSEC validation
=============================
Contemporary networks are full of broken DNS proxies so we need to jump
through various hoops to get non-faked DNSSEC data for DNSSEC validation.
The goal of this step is to get *cryptographical* proof that the data we
received are the same as DNS zone owner published.
Captive portal detection needs to allow user to disable all the security so he
can log-in but this needs to be done in a secure and reliable way so an
attacker cannot misuse this.
Some networks are so broken that even without captive portal they are not able
to deliver DNSSEC data to the clients.
In that case will try tunnel to other DNS servers on the Internet (Fedora
Infra or public DNS root) and use them. Naturally, local/internal domains need
to be available.
While I don't actually care, this might well be a sticking point for
many people since their DNS information is going to an untrusted (to
them) DNS server. Yeah, I tend to trust Fedora, but not everyone will.
Can the tunnel be turned off, or the broken servers whitelisted, or is
the answer here to just "dnf remove dnssec-trigger"?
All these sub-problems (including VPN handling an so on) are solved by
dnssec-trigger with tweaks by Tomas Hozza and Pavel Simerda.
HERE we need to coordinate with other parties who might want to write into the
NetworkManager
initscripts
dhclient
libreswan ?
resolved
connman
pppd, vpnc, openvpn, etc. should get added to the list since they all
have scripts that can potentially write to /etc/resolv.conf.
Option is either to implement all the checks and workarounds in all the
projects over and over or to implement all the logic in one place -
dnssec-trigger might be such place.
Anyone who is going to write to resolv.conf needs to check for captive
portals, find a DNSSEC-enabled DNS server, and deal with VPN-provided DNS
servers and domains.
*Questions:*
Guys, what are your plans for handling the situations mentioned above?
Can we integrate on one place (e.g. by calling into dnssec-trigger) instead
overwriting /etc/resolv.conf independently?
This is the real issue. It sounds like What you're proposing is to make
dnssec-trigger into the "DNS broker". The previous solutions
(resolvconf, NetworkManager, etc) have all failed for various reasons.
Touching/changing something so fundamental to the system, as you've
probably discovered, can be hard...

systemd-resolved might have a chance here, since it's small and pretty
simple, but they don't have an external API and don't seem interested in
creating one any time soon which severely limits it's usefulness.

If this is indeed what you're proposing, then lets have a discussion
about dnssec-trigger+unbound in that context, I do have some thoughts to
contribute here.

----

The third part of the problem, unrelated to your "API for Applications"
is the actual hotspot sign-on and connectivity detection issue. I think
that's getting discussed in other replies though.

Dan
Second problem: API for applications
====================================
(this second step is not part of the F23 feature but it is worth discussing)
Applications and crypto libraries need "an" interface to get DNS data which
are either 100 % correct or declared as not trusted. False positive (trusted)
answers are simply unacceptable because that would allow serious attacks.
Imagine that OpenSSH client is verifying server's fingerprint against the
value obtained from DNS *instead of asking the user*. If the client accepted a
fake response with faked server's fingerprint then everything is doomed.
The proposal https://sourceware.org/ml/libc-alpha/2014-11/msg00426.html on
glibc mailing list is to extend getaddr API with flag which says 'secure
answers only'. This will return an answer only if DNSSEC validation for given
answer was successful and the answer was properly signed.
The assumption here is that something like dnssec-trigger properly configures
local resolver (using the information from DHCP + applying all the necessary
workarounds) to do DNSSEC validation locally so we are 100 % sure that the
fake answer can be detected.
The open question is how to pass the information about security status to all
the parties. The mechanism needs to be simple so other resolver libraries like
e.g. python-dns can follow the same rules and use the same logic as Glibc.
a) We are in hot-spot sign-on mode or validating resolver is unavailable for
some reason (early boot, resource constraints, Docker container [finally!],
In this case *nothing* can be trusted. Resolver might return faked answers and
we have no means to check if declared trustworthiness is correct or not.
Again, we need to be 100 % sure from the cryptographical point of view.
=> Application MUST NOT receive any answer marked as "secure"/"trusted" if we
are in this mode.
b) Validating resolver is up, running, properly configured, and the path to
the resolver is trusted - it might be running on localhost or we are in Docker
container and we trust the host and so on.
In this case we trust to the result of validation indicated by AD bit.
Application will receive the answer marked as trusted if the resolver tells us
to do so by AD bit in the DNS reply.
Please read the post on Glibc mailing list for more details.
Any suggestions how to do that are more than welcome!
--
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: htt
Matthew Miller
2015-06-12 17:00:31 UTC
Permalink
Post by Dan Williams
Yeah, we did. From my recollection, most of that focused on the unbound
parts and how NM could add the dns=unbound stuff (which Pavel
contributed) but less on the NM connectivity checking, becuase Fedora
hadn't turned that on by default yet. I'm all fine with dns=unbound,
that's not the issue. The issue is more around what happens with NM's
connectivity checking, since that's used by quite a few clients,
including GNOME Shell.
I personally find the anchor icon very confusing. As a non-expert in
this area, it doesn't represent anything which seems relevant to me,
and all of the right click menu options, once I figured out to right
click, are obscure to me.

I understand "Hotspot sign-on" and can go from there, but I can't see
it not completely perplexing e.g. my dad.

I don't know what "Reprobe" does (and especially not because there's no
context other than the anchor), and "Probe Results" give some
indication that it has to do with DNSSEC — but I think that if our
users have to learn what that means and understand all that in order to
be secure (or just to browse the web at _any_ level), we're not
succeeding.

I hope we can get a design for this which integrates better with GNOME
Shell and the existing network icon there.
--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedor
Michael Catanzaro
2015-06-12 17:44:05 UTC
Permalink
Post by Matthew Miller
I hope we can get a design for this which integrates better with GNOME
Shell and the existing network icon there.
Well we're just not going to ship this in Workstation if it breaks
NetworkManager's connectivity checking, nor will we ship anything that
displays a system tray icon (that looks like a debugging tool, not
something users should ever see). But there's plenty of time left
before Fedora 23, and it sounds like several people are trying to fix
these things, so hopefully that will all be taken care of and we'll be
able to spotlight the local DNS resolver as a major new security
feature come release time.

Michael
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-o
Paul Wouters
2015-06-12 18:00:05 UTC
Permalink
Post by Matthew Miller
I personally find the anchor icon very confusing. As a non-expert in
this area, it doesn't represent anything which seems relevant to me,
and all of the right click menu options, once I figured out to right
click, are obscure to me.
Agreed.
Post by Matthew Miller
I don't know what "Reprobe" does (and especially not because there's no
context other than the anchor), and "Probe Results" give some
Those are really "developer only" things and I agree users shouldn't
even need to see those. dnssec-trigger continiously keeps probing while
in hotspot mode to check when the "jailing" has been removed and real
internet access is available.
Post by Matthew Miller
I hope we can get a design for this which integrates better with GNOME
Shell and the existing network icon there.
That would be nice indeed.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fe
Paul Wouters
2015-06-12 17:17:31 UTC
Permalink
Post by Dan Williams
Some networks are so broken that even without captive portal they are not able
to deliver DNSSEC data to the clients.
In that case will try tunnel to other DNS servers on the Internet (Fedora
Infra or public DNS root) and use them. Naturally, local/internal domains need
to be available.
While I don't actually care, this might well be a sticking point for
many people since their DNS information is going to an untrusted (to
them) DNS server. Yeah, I tend to trust Fedora, but not everyone will.
Can the tunnel be turned off, or the broken servers whitelisted, or is
the answer here to just "dnf remove dnssec-trigger"?
The fallbacks are configured in /etc/dnssec-trigger/dnssec-triggerd.conf

# Provided by fedoraproject.org, #fedora-admin
# It is provided on a best effort basis, with no service guarantee.
ssl443: 140.211.169.201 A8:3E:DA:F0:12:82:55:7E:60:B5:B5:56:F1:66:BB:13:A8:BD:FC:B4:51:41:C0:F2:E7:8E:7B:64:AA:87:E6:F2
tcp80: 140.211.169.201
ssl443: 66.35.62.163 A8:3E:DA:F0:12:82:55:7E:60:B5:B5:56:F1:66:BB:13:A8:BD:FC:B4:51:41:C0:F2:E7:8E:7B:64:AA:87:E6:F2
tcp80: 66.35.62.163
ssl443: 152.19.134.150 A8:3E:DA:F0:12:82:55:7E:60:B5:B5:56:F1:66:BB:13:A8:BD:FC:B4:51:41:C0:F2:E7:8E:7B:64:AA:87:E6:F2
tcp80: 152.19.134.150
ssl443: 2610:28:3090:3001:dead:beef:cafe:fed9 A8:3E:DA:F0:12:82:55:7E:60:B5:B5:56:F1:66:BB:13:A8:BD:FC:B4:51:41:C0:F2:E7:8E:7B:64
:AA:87:E6:F2
tcp80: 2610:28:3090:3001:dead:beef:cafe:fed9
Post by Dan Williams
Can we integrate on one place (e.g. by calling into dnssec-trigger) instead
overwriting /etc/resolv.conf independently?
This is the real issue. It sounds like What you're proposing is to make
dnssec-trigger into the "DNS broker". The previous solutions
(resolvconf, NetworkManager, etc) have all failed for various reasons.
Touching/changing something so fundamental to the system, as you've
probably discovered, can be hard...
But it must be done for security reasons.
Post by Dan Williams
systemd-resolved might have a chance here, since it's small and pretty
simple, but they don't have an external API and don't seem interested in
creating one any time soon which severely limits it's usefulness.
And last I looked it did not support DNSSEC. I'm also weary about systemd-resolved basically marshalling DNS via DBUS.
Post by Dan Williams
If this is indeed what you're proposing, then lets have a discussion
about dnssec-trigger+unbound in that context, I do have some thoughts to
contribute here.
I believe we selected dnssec-trigger because it was the UI/daemon that worked. A better native integration into either
NM or Gnome would be preferred.

Paul
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora C
Miloslav Trmač
2015-06-10 15:22:53 UTC
Permalink
Hello,
Post by Jan Kurik
= Proposed System Wide Change: Default Local DNS Resolver =
https://fedoraproject.org/wiki/Changes/Default_Local_DNS_Resolver
Install a local DNS resolver trusted for the DNSSEC validation running on
127.0.0.1:53. This must be the only name server entry in /etc/resolv.conf.
We’ve had earlier conversations about whether the resolver being used (local, remote, container host) is trusted to perform DNSSEC validation. How is this resolved? The Change page AFAICS doesn’t say.

Do you e.g. plan to have a configuration file which tells libc/and other applications dealing with resolv.conf directly to know whether the resolver can be trusted for DNSSEC? Or is perhaps the design that any resolver in /etc/resolv.conf is always trusted for DNSSEC, and sysadmins need to ensure that this is true if they use a remote one?
Mirek
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora C
P J P
2015-06-11 05:39:08 UTC
Permalink
Hello Miloslav,
Post by Miloslav Trmač
We’ve had earlier conversations about whether the resolver being used (local,
remote, container host) is trusted to perform DNSSEC validation. How is this
resolved? The Change page AFAICS doesn’t say.
Do you e.g. plan to have a configuration file which tells libc/and other
applications dealing with resolv.conf directly to know whether the resolver can
be trusted for DNSSEC? Or is perhaps the design that any resolver in
/etc/resolv.conf is always trusted for DNSSEC, and sysadmins need to ensure that
this is true if they use a remote one?
Ummn...not any resolver in resolv.conf, but 127.0.0.1 is considered to be trusted. The proposed change is also to ensure that resolv.conf always has only 127.0.0.1 entry in it; And nothing else.


Configuration changes to indicate 'trusted' character of a resolver was proposed to upstream glibc, but that is yet to be resolved properly.

-> https://www.sourceware.org/ml/libc-alpha/2014-11/msg00426.html


---
Regards
-P J P
http://feedmug.com
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http
Petr Spacek
2015-06-11 07:15:04 UTC
Permalink
Post by P J P
Hello Miloslav,
Post by Miloslav Trmač
We’ve had earlier conversations about whether the resolver being used (local,
remote, container host) is trusted to perform DNSSEC validation. How is this
resolved? The Change page AFAICS doesn’t say.
Do you e.g. plan to have a configuration file which tells libc/and other
applications dealing with resolv.conf directly to know whether the resolver can
be trusted for DNSSEC? Or is perhaps the design that any resolver in
/etc/resolv.conf is always trusted for DNSSEC, and sysadmins need to ensure that
this is true if they use a remote one?
Ummn...not any resolver in resolv.conf, but 127.0.0.1 is considered to be trusted. The proposed change is also to ensure that resolv.conf always has only 127.0.0.1 entry in it; And nothing else.
Configuration changes to indicate 'trusted' character of a resolver was proposed to upstream glibc, but that is yet to be resolved properly.
-> https://www.sourceware.org/ml/libc-alpha/2014-11/msg00426.html
Let me add that this concept of 'trusted' resolver will be added later when
Glibc gets extended API which actually can convey the information.

Realistically, in Fedora 23 we will not have the API available because Glibc
upstream is quite unresponsive about this. As a result, we are not going to
declare anything to be 'trusted' in Fedora 23.

For now apps should not make any assumptions about resolver trustworthiness
(as they did for decades).
--
Petr Spacek @ Red Hat
--
devel mailing list
***@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.
Loading...