gh-144156: Fix email header folding concatenating encoded words #144692

robsdedude · 2026-02-10T22:00:26Z

Issue: Missing 'linear-white-space' between encoded-word in RFC2047 header #144156

bedevere-app · 2026-02-10T22:00:32Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Lib/email/_header_value_parser.py

Lib/test/test_email/test_generator.py

bitdancer · 2026-02-11T22:43:46Z

FYI I've been looking at this and will have a review soonish. You are correct that the gh-92281 fix was buggy.

bitdancer

Thanks for fixing this!

I edited this review a bunch, so mistake could have crept in to my code suggestions, though I did test them. The randomizer test should catch any transcription errors if nothing else does.

bitdancer · 2026-02-12T18:15:23Z

Lib/email/_header_value_parser.py

+            break
+    if lines_popped:
+        lines.append(' ' if lines else '')
+    return ''.join(wsp_lines)


I don't think this is necessary, see my suggestion below.

bitdancer · 2026-02-12T18:15:27Z

Lib/email/_header_value_parser.py

-                             # words, we may need to encode the whitespace
-                             # at the beginning of the second word.
+    last_word_is_ew = False
    last_ew = None  # Points to the last encoded character if there's an ew on


This comment that was added by 90281 is incorrect. It could say "if there is an encoded word in the last line of lines, points to the encoded word's first character". Which means it is a badly named variable (that's my fault). We could rename it 'last_ew_start', but I don't know if we want to do that in this PR. The comment should be fixed, though.

bitdancer · 2026-02-12T18:15:45Z

Lib/email/_header_value_parser.py

-                # This whitespace has been added to the lines in _fold_as_ew()
-                # so clear it now.
-                leading_whitespace = ''
+                                      part.ew_combine_allowed, charset, last_word_is_ew)


Let's just reformat this:

last_ew = _fold_as_ew( tstr, lines, maxlen, last_ew, part.ew_combine_allowed, charset, last_word_is_ew, )

bitdancer · 2026-02-12T18:17:32Z

Lib/email/_header_value_parser.py

        if len(tstr) <= maxlen - len(lines[-1]):
            lines[-1] += tstr
+            if any(char not in WSP for char in tstr):
+                last_word_is_ew = False


Suggested change

last_word_is_ew = False

last_word_is_ew = not tstr.strip(_WSP) and last_word_is_ew

You'll need to add

_WSP = ''.join(WSP)

up at the top of the file where WSP is defined. (I'm actually adding that constant myself in a different PR I'm working on).

I haven't run timeit on it, but this should be more efficient as well as more compact. Hmm. I suppose it would be faster to have last_word_is_ew first, but then we should maybe wrap the tstr reference in bool for cleanliness. Up to you.

Much faster indeed.

import timeit _wSP = ' \t' WSP = set(_wSP) def one(s): return bool(s.strip(_wSP)) def two(s): return all(c in WSP for c in s) def three(s): return not any(c not in WSP for c in s) s1 = ' ' * 80 s2 = ' ' * 40 + 'a' + ' ' * 40 s3 = ' ' + 'a' * 80 + ' ' print('s1') print(timeit.timeit(lambda: one(s1), number=10_000)) print(timeit.timeit(lambda: two(s1), number=10_000)) print(timeit.timeit(lambda: three(s1), number=10_000)) print('s2') print(timeit.timeit(lambda: one(s2), number=10_000)) print(timeit.timeit(lambda: two(s2), number=10_000)) print(timeit.timeit(lambda: three(s2), number=10_000)) print('s3') print(timeit.timeit(lambda: one(s3), number=10_000)) print(timeit.timeit(lambda: two(s3), number=10_000)) print(timeit.timeit(lambda: three(s3), number=10_000))

s1 0.014692971999920701 0.22478941200006375 0.22499246799998218 s2 0.01503182199985531 0.12194120699996347 0.1325552719999905 s3 0.011907245000202238 0.025579678999747557 0.025823422000030405

bitdancer · 2026-02-12T18:17:36Z

Lib/email/_header_value_parser.py

-                    whitespace_accumulator.append(char)
-                leading_whitespace = ''.join(whitespace_accumulator)
+                if any(char not in WSP for char in lines[-1]):
+                    last_word_is_ew = False


Suggested change

last_word_is_ew = False

last_word_is_ew = not tstr.strip(_WSP) and last_word_is_ew

bitdancer · 2026-02-12T18:19:15Z

Lib/email/_header_value_parser.py

+        if any(char not in WSP for char in tstr):
+            last_word_is_ew = False


Suggested change

if any(char not in WSP for char in tstr):

last_word_is_ew = False

last_word_is_ew = not tstr.strip(_WSP) and last_word_is_ew

bitdancer · 2026-02-12T18:19:22Z

Lib/email/_header_value_parser.py

+        # Therefore, we encode all to-be-displayed whitespace in the second
+        # encoded word.
+        leading_whitespace = _steal_all_trailing_WSP_if_exists(lines)
+        to_encode = leading_whitespace + to_encode


Suggested change

to_encode = leading_whitespace + to_encode

len_without_wsp = len(lines[-1].rstrip(_WSP))

leading_whitespace = lines[-1][len_without_wsp:]

lines[-1] = lines[-1][:len_without_wsp] + (

' ' if leading_whitespace else '')

to_encode = leading_whitespace + to_encode

This avoids that complicated single use function. We should never be producing lines that contain only blanks, they get encoded.

You can also move this up to be the first elif and drop the 'and not last_word_is_ew'.

We should never be producing lines that contain only blanks, they get encoded.

While testing, I noticed, too, that wrapped spaces are always encoded. TBH, I didn't like that and wanted to come up with a solution that's robust to changing this behavior in the future. I didn't like it, because reading the RFC and the rest of this code it sure seems like not encoding words is generally preferable because of its human readability (and to a lesser extent mail clients that don't support encoded words). Further, I thought assuming an invariant that's enforced a few hundred lines away in a different function is a nice recipe for spooky action at a distance.

If you anyway think the fix should assume this invariant, I'll change this part.

bitdancer · 2026-02-12T18:19:35Z

Lib/email/_header_value_parser.py

+        text_space = remaining_space - chrome_len
        if text_space <= 0:
            lines.append(' ')
            continue


One of my new tests revealed a pre-existing problem here:

Suggested change

continue

if text_space <= 0:

newline = _steal_trailing_WSP_if_exists(lines)

lines.append(newline or ' ')

new_last_ew = len(lines[-1])

continue

bitdancer · 2026-02-12T18:19:40Z

Lib/test/test_email/test_generator.py

+        g = BytesGenerator(s)
+        g.flatten(msg)
+        self.assertEqual(s.getvalue(), expected)
+


Here is a test harness I used to verify the code:

def test_ew_folding_round_trip_random(self): from random import randint c = 30 b = 30 w = 30 source = '' #source += ' ' * randint(0, b) source += ('a' * randint(1, c), 'ф' * randint(1, c))[randint(0, 1)] for i in range(randint(1, w)): source += ' ' * randint(1, b) source += ('a' * randint(1, c), 'ф' * randint(1, c))[randint(0, 1)] source += ' ' * randint(0, b) with open('temp', 'a') as f: print(source, file=f) msg = EmailMessage() msg['Subject'] = source s = io.BytesIO() g = BytesGenerator(s, maxheaderlen=30) g.flatten(msg) flat = s.getvalue() reparsed = message_from_bytes(flat, policy=policy.default)['Subject'] self.assertMultiLineEqual(reparsed, source)

I'm not sure if we should include a variation on this in the PR or not, but you might want to run it a bunch, maybe tweaking the constants, to see if I missed anything. I ran it for a while with the above parameters and saw no failures.

Here are two tests that the above produced, that located the pre-existing bug I mentioned above:

def test_ew_folding_round_trip_1(self): print() source = "aaaaaaaaa фффффффф " msg = EmailMessage() msg['Subject'] = source s = io.BytesIO() g = BytesGenerator(s, maxheaderlen=30) g.flatten(msg) flat = s.getvalue() reparsed = message_from_bytes(flat, policy=policy.default)['Subject'] self.assertMultiLineEqual(reparsed, source) def test_ew_folding_round_trip_2(self): print() source = "aaa aaaaaaa aaa ффф фффф " msg = EmailMessage() msg['Subject'] = source s = io.BytesIO() g = BytesGenerator(s, maxheaderlen=30) g.flatten(msg) flat = s.getvalue() reparsed = message_from_bytes(flat, policy=policy.default)['Subject'] self.assertMultiLineEqual(reparsed, source)

bitdancer · 2026-02-12T18:19:43Z

Misc/NEWS.d/next/Core_and_Builtins/2026-02-10-22-05-51.gh-issue-144156.UbrC7F.rst

@@ -0,0 +1,3 @@
+Fix folding of email headers violating `RFC 2047`_ with two consecutive encoded words without separating linear-white-space.
+
+.. _RFC 2047: https://www.rfc-editor.org/rfc/rfc2047


Suggested change

.. _RFC 2047: https://www.rfc-editor.org/rfc/rfc2047

Fix the folding of headers by the :mod:`email` library when :rfc:`2047` encoded words are used. Now whitespace is correctly preserved and also correctly added between adjacent encoded words. The latter property was broken by the fix for gh-92081, which mostly fixed previous failures to preserve whitespace.

Out of curiosity I checked, and according to the changelogs the buggy fix was released in 3.14 alpha 1, 3.13 beta 2 and 3.12.5. Unfortunately the latter is now final. Maybe we should include that in the news item? I'm not sure what the policy is on that.

Maybe we should include that in the news item?

I'm all for documenting known issues and shortcomings. However, I doubt people would find this info in the release notes unless they're doing "software archeology", i.e., they already know (roughly) what info they're looking for.

I'm not sure what the policy is on that.

Neither am I. I'm pretty much a first-time contributor 😅

bedevere-app · 2026-02-12T18:37:12Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Co-authored-by: R. David Murray <[email protected]>

Fix email header wrapping omitting white space

b5925e0

bedevere-app bot mentioned this pull request Feb 10, 2026

Missing 'linear-white-space' between encoded-word in RFC2047 header #144156

Open

robsdedude changed the title ~~gh-144156: Fix email header wrapping omitting white space~~ gh-144156: Fix email header folding concatenating encoded words Feb 10, 2026

blurb-it bot and others added 3 commits February 10, 2026 22:05

📜🤖 Added by blurb_it.

1a7824e

fixup! Fix email header wrapping omitting white space

fb23a6e

fixup! Fix email header wrapping omitting white space

050491c

robsdedude marked this pull request as ready for review February 11, 2026 13:32

robsdedude requested a review from a team as a code owner February 11, 2026 13:32

bedevere-app bot added the awaiting review label Feb 11, 2026

aisk reviewed Feb 11, 2026

View reviewed changes

Lib/email/_header_value_parser.py Show resolved Hide resolved

aisk reviewed Feb 11, 2026

View reviewed changes

Lib/test/test_email/test_generator.py Outdated Show resolved Hide resolved

Formatting

ed6f197

bitdancer requested changes Feb 12, 2026

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Feb 12, 2026

Extend news fragment

91b4b3b

Co-authored-by: R. David Murray <[email protected]>

	last_word_is_ew = False
	last_word_is_ew = not tstr.strip(_WSP) and last_word_is_ew

		if any(char not in WSP for char in tstr):
		last_word_is_ew = False

-        to_encode = leading_whitespace + to_encode
+        len_without_wsp = len(lines[-1].rstrip(_WSP))
+        leading_whitespace = lines[-1][len_without_wsp:]
+        lines[-1] = lines[-1][:len_without_wsp] + (
+            ' ' if leading_whitespace else '')
+        to_encode = leading_whitespace + to_encode

		@@ -0,0 +1,3 @@
		Fix folding of email headers violating `RFC 2047`_ with two consecutive encoded words without separating linear-white-space.

		.. _RFC 2047: https://www.rfc-editor.org/rfc/rfc2047

	.. _RFC 2047: https://www.rfc-editor.org/rfc/rfc2047
	Fix the folding of headers by the :mod:`email` library when :rfc:`2047` encoded words are used. Now whitespace is correctly preserved and also correctly added between adjacent encoded words. The latter property was broken by the fix for gh-92081, which mostly fixed previous failures to preserve whitespace.

Uh oh!

gh-144156: Fix email header folding concatenating encoded words #144692

Are you sure you want to change the base?

gh-144156: Fix email header folding concatenating encoded words #144692

Conversation

robsdedude commented Feb 10, 2026 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

bitdancer commented Feb 11, 2026

Uh oh!

bitdancer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robsdedude commented Feb 10, 2026 •

edited by bedevere-app bot

Loading