Open
Conversation
- Needs review
…ance - Separate cloud-agnostic topology guidance from AWS-specific examples - Add "Data Movement in Kafka" section explaining producer write path, consumer read path, and partition leader/follower architecture - Generalize cloud topology section to use vCPU counts instead of AWS-specific instance types - Consolidate AWS deployment details into dedicated example section - Update single-node topology with complete system specifications: * 2 sockets, 192 cores/socket (96 physical + HT) * 6 NUMA nodes (3 per socket, 32 cores/node, SNC enabled) * 3 brokers pinned to dedicated NUMA nodes with 16 logical CPUs each - Add multi-cloud examples (AWS m8i, GCP C4) for Intel Xeon 6 guidance - Remove confusing NUMA notation and provide clear CPU pinning examples
- Add disclaimer note about 4.2 RC testing - Update all version references to specify "release candidate (RC)" - Remove TODO placeholders and update docs URLs to Kafka 4.1 - Clarify vm.dirty_background_bytes applies system-wide
Collaborator
|
Need to add a link in the main README under Software https://github.com/intel/optimization-zone/blob/main/README.md |
|
|
||
| ## Single-node BIOS Configuration Recommendations | ||
| If the user has access to the BIOS for a system, here are some parameters that can be changed to improve Kafka performance. | ||
| - **Sub-NUMA CLustering (SNC)**: enabls multiple NUMA nodes so each broker can run on its own NUMA node |
Collaborator
There was a problem hiding this comment.
Also,
Sub-NUMA Clustering
| net.ipv4.tcp_wmem='4096 65536 16777216' | ||
|
|
||
| ################################################################ | ||
| #setting the system to performance mode for best possible perf # |
Collaborator
There was a problem hiding this comment.
Comment spacing
setting
| - Example: Cloud instance with 16 vCPUs | ||
| - `num.network.threads=6`: should be less than or equal to half the CPU cores assigned to a broker | ||
| - `num.io.threads=8`: should be less than or equal to the count of CPU cores assigned to a broker | ||
| - `num.replica.fetchers=2`: increased beyond the default of 2 to improve replication latency |
Collaborator
There was a problem hiding this comment.
It says increased beyond the default of 2, but it is set to 2.
| Another potential resource bottleneck in a cloud deployment can be the storage bandwidth of volumes in their default configuration. It's usually possible to increase the I/O operations per second (IOPS) and bandwidth for a volume at creation time. It's recommended that these volumes be configured with high IOPS and throughput where possible. If storage performance of a single volume that's been configured for maximum throughput is still insufficient to meet an SLA, additional volumes may be attached to brokers or the brokers may be moved to instances with direct-attached NVMes. | ||
| As with other system resources, storage telemetry should be monitored to ensure individual devices are not operating beyond their allotted steady-state performance. | ||
| Scaling storage when hitting instance resource limits is somewhat more flexible than scaling the network because, in addition to the possibility of growing the cluster capacity with scale-out of additional brokers, additional storage volumes can usually be added to brokers to increase their storage capacity. | ||
| An alternative to adding volumes would be to scale up the brokers to systems with direct-attached NVMe's that enable high-performance storage. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'd like to submit this Kafka optimization guide to the optimization zone.