Commit Graph

511 Commits

Author SHA1 Message Date
Gleb Smirnoff
0f9d0a73a4 Merge from projects/sendfile:
o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 12:52:33 +00:00
Baptiste Daroussin
3e11bd9e2a Convert to usr.bin/ to LIBADD
Reduce overlinking
2014-11-25 14:29:10 +00:00
Andrey V. Elsukov
5dbfa43f65 Add the ability to set `prefer_source' flag to an IPv6 address.
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2014-09-09 10:52:50 +00:00
Andrey V. Elsukov
ccc53de916 Add the reverse part to rule #9. Also change its description in the
netstat(8) output.

MFC after:	1 week
2014-09-01 09:30:34 +00:00
Mark Johnston
d77e67e495 Suppress warnings when retrieving protocol stats from interfaces that
don't support IPv6 (e.g. pflog(4)).

Reviewed by:	hrs
MFC after:	2 weeks
2014-08-22 19:23:38 +00:00
Joel Dahl
275b78396e Minor mdoc nit. 2014-06-06 08:42:03 +00:00
Allan Jude
997a303f17 Sadly, we do not actually live in the future.
Approved by:	wblock (mentor)
2014-06-04 16:55:38 +00:00
Allan Jude
fd2c6bc9e1 Further updates to the netstat(1) man page and usage message
- Reformat the entire man page
- Create a proper synopsis section
- Use itemized-lists to describe each flag, rather than paragraphs
- Cross-reference common flags to a 'general flags' sub-section with short
inline description of the flag
- Label 'general flags' sub-section
- Apply additional fixes suggested by wblock, brueffer, and bdrewery
- Update .Dd that got undone previously
- Change the order of the .Op Fl to be alphabetical
- Add the -i | -I interface flags to the description of 'interface
display mode'
- Fix missing parameters in man page
- Fix missing parameters in usage()
- Sync man page and usage()

MFC Note: stable/9 and stable/10 do not have -R, will need to be removed
when merged

CR:		D58
Reviewed by:	brueffer, bcr
Approved by:	wblock (mentor)
MFC after:	7 days
Sponsored by:	ScaleEngine Inc.
2014-06-04 04:18:33 +00:00
Allan Jude
ada0a6dd14 Add path markup on sys/mbuf.h to previous netstat(1) man page update
Submitted by:	brueffer
Reviewed by:	eadler (mentor)
2014-05-25 08:09:55 +00:00
Allan Jude
e9fa95e9e9 Document the new -R flag of netstat(1) introduced in r266448 that tracks the
flowid for each socket.

Reviewed by:	adrian
Approved by:	eadler (mentor)
2014-05-25 07:41:12 +00:00
Hiroki Sato
c4f55e08be - Fix a bug which can make sysctl() fail when -F is specified.
- Increase WID_IF_DEFAULT() from 6 to 8 (the default for AF_INET6) because
  we have interfaces with longer names than 6 chars like epairN{a,b}.
- Style fixes.
2014-05-21 10:04:51 +00:00
Adrian Chadd
85b0f0f325 Add -R to netstat to dump RSS/flow information.
This is intended to help in diagnostics and debugging of NIC and stack
flowid support.

Eventually this will grow another column (RSS CPU ID) but
that currently isn't cached in the inpcb.

There's also no clean flowtype -> flowtype identifier string.  This is
the mbuf M_HASHTYPE_* values for RSS.

Here's some example output:

adrian@adrian-hackbox:~/work/freebsd/head/src % netstat -Rn | more
Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address           flowid ftype
tcp4       0      0 10.11.1.65.22          10.11.1.64.12409        29041942     2
udp4       0      0 127.0.0.1.123          *.*                     00000000     0
udp6       0      0 fe80::1%lo0.123        *.*                     00000000     0
udp6       0      0 ::1.123                *.*                     00000000     0
udp4       0      0 10.11.1.65.123         *.*                     00000000     0

Tested:

* amd64 system w/ igb NIC; local driver changes to expose RSS flowid in if_igb.
2014-05-19 17:11:43 +00:00
Hiroki Sato
0e798e1faa - Do not override sin6_scope_id in LLA when it is already set to non-zero.
This fixes destination list in output of netstat -r.
- Plug a memory leak.
- Add RTM_VERSION check.
- Minor style fixes.
2014-05-15 19:26:20 +00:00
Warner Losh
c6063d0da8 Use src.opts.mk in preference to bsd.own.mk except where we need stuff
from the latter.
2014-05-06 04:22:01 +00:00
Gleb Smirnoff
c669105d17 - Remove net.inet.tcp.reass.overflows sysctl. It counts exactly
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
  description in netstat(1) output.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-06 00:00:07 +00:00
Alexander V. Chernikov
68bbdd0e71 Fix "netstat -gW" behavior broken in r259638.
netstat has two options for printing multicast tables:
sysctl (the default one for live systems) and kvm-based one (for cores).
It looks like kvm-based one hasn't been working since it's been introduced
in r190012 due to absence of mfctablesize kernel symbol.
Check for all ipv4-multicast symbols being correctly resolved was introduced
in r259638 regardless of 'live' value leading to "No IPv4 MROUTING" error
message.

Reported by:	Olivier Cochard-Labbé
MFC after:	1 week
2014-04-29 16:51:28 +00:00
Gleb Smirnoff
55fb7d688b Now, after r263102 we have ifi_oqdrops in if_data, restore printing of
output queue drops in netstat(1).

No driver, neither kernel fills this field in if_data, yet.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-19 03:33:32 +00:00
Gleb Smirnoff
66dcee729c Garbage collect long time obsoleted (or never used) stuff from routing API. 2014-03-15 06:49:32 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Gleb Smirnoff
2c284d9395 Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Gleb Smirnoff
46425317db Fix compilation for 32-bit machines. 2014-03-06 02:00:01 +00:00
Gleb Smirnoff
5274e55eb3 Hide struct rtentry from userland. 2014-03-05 01:47:08 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
f0e49f6631 Whenever flowtable lookup fails, we do route lookup and then try to
insert flow entry. During the route lookup the critical section is
exited. It may happen, that after route lookup we will be executed
on an other CPU that already has such flowentry. Before this change
we simply freed the flowentry and returned to ip_output() with
failure.

Actually there is nothing wrong with using previously allocated
flow entry, updating it properly. Thus, make flowentry_insert()
return the new either old fle, and make use of it.

Count reuses as "collisions" and real inserts as "inserts".

Reviewed by:	adrian
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-14 10:56:26 +00:00
Adrian Chadd
6d9223ac7e Reword.
Suggestion:	glebius
2014-02-14 07:43:39 +00:00
Adrian Chadd
0e778c88c9 Don't insert a flowtable entry if the lle isn't yet valid.
Some of the collisions that are occuring are due to flowtable lookups
that succeed but have an invalid lle - typically because the L2 adjacency
lookup hasn't completed.  This would lead to a follow-up insert which
would then fail (ie, collision) and the code would fall through to doing
a slow-path L2/L3 lookup in the netinet/netinet6 code.

This patch simply aborts storing a new flowtable entry if the lle isn't
yet valid.

Whilst I'm here, add a new pcpu counter for the item so the number of
failures can be tracked separately from generic "collisions."

Reviewed by:	glebius
MFC after:	10 days
Sponsored by:	Netflix, Inc.
2014-02-14 00:05:09 +00:00
Gleb Smirnoff
5d6d7e756b o Revamp API between flowtable and netinet, netinet6.
- ip_output() and ip_output6() simply call flowtable_lookup(),
    passing mbuf and address family. That's the only code under
    #ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
  - Remove hand made pcpu stats, and utilize counter(9).
  - Snapshot of statistics is available via 'netstat -rs'.
  - All sysctls are moved into net.flowtable namespace, since
    spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
  - Remove chain of multiple flowtables. We simply have one for
    IPv4 and one for IPv6.
  - Flowtables are allocated in flowtable.c, symbols are static.
  - With proper argument to SYSINIT() we no longer need flowtable_ready.
  - Hash salt doesn't need to be per-VNET.
  - Removed rudimentary debugging, which use quite useless in dtrace era.

The runtime behavior of flowtable shouldn't be changed by this commit.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-07 15:18:23 +00:00
Bjoern A. Zeeb
a4993b252e Print the MD5 signature information introduced in r221023 in the
TCP statistics output.

MFC after:	3 weeks
2014-02-05 20:43:03 +00:00
Alexander V. Chernikov
120dc21b86 Bump dates in nestat(1) and route(8) man pages.
Fix several small errors introduced by r260524.

Suggested by:	glebius
MFC after:	2 weeks
2014-01-11 09:44:00 +00:00
Alexander V. Chernikov
17ed2e8ea8 Add -4/-6 shorthand for -finet/-finet6 in route(8) and netstat(8).
MFC after:	2 weeks
2014-01-10 23:08:18 +00:00
Alexander V. Chernikov
dbfdd46b70 Explicitly free rt_tables to please Coverity.
Reported by:	Coverity
Coverity CID:	1147174
MFC after:	2 weeks
2013-12-31 12:11:48 +00:00
Gleb Smirnoff
526b18aa8b Claim copyright since I've almost rewritten this file in r256512. 2013-12-29 19:31:49 +00:00
Alexander V. Chernikov
8e1dc13857 Further split kvm(3) and sysctl interfaces for route table printing.
MFC after:	4 weeks
Sponsored by:	Yandex LLC
2013-12-20 12:08:36 +00:00
Alexander V. Chernikov
fc47e028bb Use more fine-grained kvm(3) symbol lookup: routing code retrieves only
necessary symbols needed per subsystem. Main kvm(3) init is now delayed
as much as possbile. This finally fixes performance issues reported in
kern/167204.
Some non-working code (ng_socket.ko symbol addresses calculation) removed.
Some global variables eliminated.

PR:		kern/167204
MFC after:	4 weeks
2013-12-20 00:17:26 +00:00
Alexander V. Chernikov
11188df260 Restore corefiles handling via kvm(3).
Found by:	John-Mark Gurney <jmg at funkthat.com>
MFC after:	4 weeks
2013-12-18 20:04:04 +00:00
Alexander V. Chernikov
c49b4b8055 Switch netstat -rn to use standard API for retrieving list of routes
instead of peeking inside in-kernel radix via kget.
This permits us to change kernel structures without breaking userland.
Additionally, this change provide more reliable and faster output.

`Refs` and `Use` fields available in IPv4 by default (and via -W
for other families) were removed. `Refs` is radix-specific thing
which is not informative for users. `Use` field value is handy sometimes,
but a) current API does not support it and b) I'm not sure we will
support per-rte pcpu counters in near future.

Old method of retrieving data is still supported (either by defining
NewTree=0 or running netstat with -A). However, Refs/Use fields are
hidden.

Sponsored by:	Yandex LLC
MFC after:	4 weeks
PR:		kern/167204
2013-12-18 18:25:27 +00:00
Gleb Smirnoff
82b90729fb 'netstat -i' no longer supports working on a vmcore. 2013-10-30 08:13:42 +00:00
Gleb Smirnoff
3e4d5cd37b Make userland tools honor WITHOUT_PF build option.
Tested by:	dt71@gmx.com
2013-10-29 17:38:13 +00:00
Gleb Smirnoff
84c1edcbad Rewrite netstat/if.c to use getifaddrs(3) and getifmaddrs(3) instead of
libkvm digging in kernel memory. This is possible since r231506 made
getifaddrs(3) to supply if_data for each ifaddr.

  The pros of this change is that now netstat(1) doesn't know about kernel
struct ifnet and struct ifaddr. And these structs are about to change
significantly in head soon. New netstat binary will work well with 10.0
and any future kernel.

  The cons is that now it isn't possible to obtain interface statistics
from a vmcore.

  Functions intpr() and sidewaysintpr() were rewritten from scratch.

  The output of netstat(1) has underwent the following changes:

1) The MTU is not printed for protocol addresses, since it has no notion.
   Dash is printed instead. If there would be a strong desire to return
   previous output, it is doable.
2) Output interface queue drops are not printed. Currently this data isn't
   available to userland via any API. We plan to drop 'struct ifqueue' from
   'struct ifnet' very soon, so old kvm(3) access to queue drops is soon
   to be broken, too. The plan is that drivers would handle their queues
   theirselves and a new field in if_data would be updated in case of drops.
3) In-kernel reference count for multicast addresses isn't printed. I doubt
   that anyone used it. Anyway, netstat(1) is sysadmin tool, not kernel
   debugger.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 09:55:07 +00:00
Gleb Smirnoff
d35acb5099 Remove obtained, but never used data.
Found by:	gcc
2013-10-15 09:21:05 +00:00
Hiroki Sato
84dde578a9 - Use getnameinfo(3) instead of gethostbyaddr(3) or inet_ntop(3).
- Fill sin6_scope_id from in6p.sin6_addr.s6_addr[2].  struct inpcb has
  struct in6_addr for the endpoint addresses, so sin6_scope_id must be filled.
2013-08-17 17:23:42 +00:00
Andrey V. Elsukov
6794f46021 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
Gleb Smirnoff
c780f37850 Sweep unused nlist entries.
Sponsored by:	Nginx, Inc.
2013-07-16 12:22:36 +00:00
Andrey V. Elsukov
05d1f5bce0 Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00
Hiroki Sato
3fddef95af Add -F fibnum option to specify an FIB number for -r flag. 2013-07-12 17:11:30 +00:00
Andrey V. Elsukov
db8c087944 Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat,
ipsec4stat, ipsec6stat to PCPU counters.
2013-07-09 10:08:13 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00