Lines Matching +full:can +full:- +full:fd
1 .. SPDX-License-Identifier: GPL-2.0
17 Howto can be found at:
22 - Ulisses Alonso Camaró <[email protected]>
23 - Johann Baudy
34 configurable circular buffer mapped in user space that can be used to either
37 transmission, multiple packets can be sent through one system call to get the
48 card can also be an advantage.
67 [setup] socket() -------> creation of the capture socket
68 setsockopt() ---> allocation of the circular buffer (ring)
70 mmap() ---------> mapping of the allocated buffer to the
73 [capture] poll() ---------> to wait for incoming packets
75 [shutdown] close() --------> destruction of the capture socket and
83 int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
86 information can be captured or SOCK_DGRAM for the cooked
88 supported and a link level pseudo-header is provided
92 is done by a simple call to close(fd).
95 for capture and transmission. This can be done by mapping the
107 [setup] socket() -------> creation of the transmission socket
108 setsockopt() ---> allocation of the circular buffer (ring)
110 bind() ---------> bind transmission socket with a network interface
111 mmap() ---------> mapping of the allocated buffer to the
114 [transmission] poll() ---------> wait for free packets (optional)
115 send() ---------> send all packets that are set as ready in
117 The flag MSG_DONTWAIT can be used to return
120 [shutdown] close() --------> destruction of the transmission socket and
126 int fd = socket(PF_PACKET, mode, 0);
128 The protocol can optionally be 0 in case we only want to transmit
138 --------------------
141 |--------------------|
145 --------------------
159 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
167 bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
174 frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
183 can set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
192 - Capture process::
194 setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
196 - Transmission process::
198 setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
214 related meta-information like timestamps without requiring a system call.
236 +---------+---------+ +---------+---------+
238 +---------+---------+ +---------+---------+
241 +---------+---------+ +---------+---------+
243 +---------+---------+ +---------+---------+
245 A frame can be of any size with the only condition it can fit in a block. A block
246 can only hold an integer number of frames, or in other words, a frame cannot
259 ----------------
268 More precisely the limit can be calculated as::
276 So get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel
279 User space programs can include /usr/include/sys/user.h and
282 The pagesize can also be determined dynamically with the getpagesize (2)
286 ------------------
292 called pg_vec, its size limits the number of blocks that can be allocated::
294 +---+---+---+---+
296 +---+---+---+---+
305 a pool of pre-determined sizes. This pool of memory is maintained by the slab
307 hence which imposes the maximum memory that kmalloc can allocate.
310 predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
324 <size-max> is the maximum size of allocable with kmalloc
326 <pointer size> depends on the architecture -- ``sizeof(void *)``
327 <page size> depends on the architecture -- PAGE_SIZE or getpagesize (2)
328 <max-order> is the value defined with MAX_PAGE_ORDER
334 <block number> = <size-max>/<pointer size>
335 <block size> = <pagesize> << <max-order>
348 <size-max> = 131072 bytes
351 <max-order> = 11
358 and hence the buffer will have a 262144 MiB size. So it can hold
367 the allocation can wait and swap other process' memory in order to allocate
368 the necessary memory, so normally limits can be reached.
371 -----------------
382 - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
383 - struct tpacket_hdr
384 - pad to TPACKET_ALIGNMENT=16
385 - struct sockaddr_ll
386 - Gap, chosen so that packet data (Start+tp_net) aligns to
388 - Start+tp_mac: [ Optional MAC header ]
389 - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
390 - Pad to align to TPACKET_ALIGNMENT=16
395 - tp_block_size must be a multiple of PAGE_SIZE (1)
396 - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
397 - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
398 - tp_frame_nr must be exactly frames_per_block*tp_block_nr
404 ---------------------------------------------
411 mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
423 setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
424 setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
426 rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
434 to be used for the kernel, If not, there is a frame the user can read
450 larger than tp_frame_size. This packet can be
457 The number of frames that can be buffered to
484 at least the TP_STATUS_USER flag. Then the user can read the packet,
486 can use again that frame buffer.
488 The user can use poll (any other variant should apply too) to check if new
493 pfd.fd = fd;
516 This can be done on multiple frames. Once the user is ready to transmit, it
525 header->tp_len = in_i_size;
526 header->tp_status = TP_STATUS_SEND_REQUEST;
527 retval = send(this->socket, NULL, 0, 0);
529 The user can also use poll() to check if a buffer is available:
536 pfd.fd = fd;
547 setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
548 getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
550 where 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
553 - Default if not otherwise specified by setsockopt(2)
554 - RX_RING, TX_RING available
556 TPACKET_V1 --> TPACKET_V2:
557 - Made 64 bit clean due to unsigned long usage in TPACKET_V1
560 - Timestamp resolution in nanoseconds instead of microseconds
561 - RX_RING, TX_RING available
562 - VLAN metadata information available for packets
566 - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
568 - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
571 - How to switch to TPACKET_V2:
580 TPACKET_V2 --> TPACKET_V3:
581 - Flexible buffer implementation for RX_RING:
582 1. Blocks can be configured with non-static frame-size
583 2. Read/poll is at a block-level (as opposed to packet-level)
584 3. Added poll timeout to avoid indefinite user-space wait
586 4. Added user-configurable knobs:
591 - RX Hash data available in user space
592 - TX_RING semantics are conceptually similar to TPACKET_V2;
597 Packets with non-zero values of tp_next_offset will be dropped.
602 In the AF_PACKET fanout mode, packet reception can be load balanced among
607 - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
608 - PACKET_FANOUT_LB: schedule to socket by round-robin
609 - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
610 - PACKET_FANOUT_RND: schedule to socket by random selection
611 - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
612 - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
646 int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
651 if (fd < 0) {
658 err = ioctl(fd, SIOCGIFINDEX, &ifr);
667 err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
674 err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
681 return fd;
686 int fd = setup_socket();
689 if (fd < 0)
690 exit(fd);
692 while (limit-- > 0) {
696 err = read(fd, buf, sizeof(buf));
707 close(fd);
713 int fd, err;
740 case -1:
758 AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
764 * ~15% - 20% reduction in CPU-usage
773 it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
775 /* Written from scratch, but kernel-to-user space API usage
828 int err, i, fd, v = TPACKET_V3;
833 fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
834 if (fd < 0) {
839 err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v));
845 memset(&ring->req, 0, sizeof(ring->req));
846 ring->req.tp_block_size = blocksiz;
847 ring->req.tp_frame_size = framesiz;
848 ring->req.tp_block_nr = blocknum;
849 ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
850 ring->req.tp_retire_blk_tov = 60;
851 ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
853 err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
854 sizeof(ring->req));
860 ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
861 PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
862 if (ring->map == MAP_FAILED) {
867 ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
868 assert(ring->rd);
869 for (i = 0; i < ring->req.tp_block_nr; ++i) {
870 ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
871 ring->rd[i].iov_len = ring->req.tp_block_size;
882 err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
888 return fd;
893 struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
896 if (eth->h_proto == htons(ETH_P_IP)) {
902 ss.sin_addr.s_addr = ip->saddr;
908 sd.sin_addr.s_addr = ip->daddr;
912 printf("%s -> %s, ", sbuff, dbuff);
915 printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
920 int num_pkts = pbd->h1.num_pkts, i;
925 pbd->h1.offset_to_first_pkt);
927 bytes += ppd->tp_snaplen;
931 ppd->tp_next_offset);
940 pbd->h1.block_status = TP_STATUS_KERNEL;
943 static void teardown_socket(struct ring *ring, int fd)
945 munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
946 free(ring->rd);
947 close(fd);
952 int fd, err;
968 fd = setup_socket(&ring, argp[argc - 1]);
969 assert(fd > 0);
972 pfd.fd = fd;
979 if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
980 poll(&pfd, 1, -1);
990 err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
1001 teardown_socket(&ring, fd);
1013 setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
1015 This has the side-effect, that packets sent through PF_PACKET will bypass the
1017 packet are not buffered, tc disciplines are ignored, increased loss can occur
1019 you have been warned; generally, this can be useful for stress testing various
1030 NIC is capable of timestamping packets in hardware, you can request those
1038 setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
1064 one can extract the type of timestamp in a second step from tp_status)!
1078 - Packet sockets work well together with Linux socket filters, thus you also