Conversation
There was a problem hiding this comment.
Hi, @jangjangji welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
Thanks for reviewing! |
- Add buffer.seek(0) after truncate to reset file pointer position - Add isinstance and len check before accessing item[16] to prevent IndexError - Improves robustness when processing malformed CSV log entries
5d9ea28 to
81e2de2
Compare
| if isinstance(item, list) and len(item) > 16: | ||
| item[16] = item[16] + ": " |
There was a problem hiding this comment.
When is the condition false? Is it correct to ignore the else branch?
There was a problem hiding this comment.
### When is the condition false?
The condition is false when len(item) <= 16, which occurs when the CSV reader incorrectly parses lines containing multi-line strings (e.g., SQL
statements with newlines).
While our CSV files follow the standard 23-column PostgreSQL csvlog format, some log entries contain multi-line quoted strings that cause the CSV
reader to return incomplete rows with fewer than 17 columns.
Is it correct to ignore the else branch?
Yes. The item[16] + ": " transformation is optional formatting to help regex pattern matching - not a required operation.
Without this check, a single malformed line causes IndexError, which terminates the entire iterator silently (treated as StopIteration),
resulting in 0 output despite reading thousands of lines.
By skipping the transformation for malformed lines:
- Valid lines (23 columns) → processed normally with the transformation
- Malformed lines (< 17 columns) → passed through unchanged
- Downstream filters naturally handle/discard non-matching entries
This allows processing to continue instead of failing silently on the first malformed line.
There was a problem hiding this comment.
@gfphoenix78 Sorry for the late reply. I've just added my response above. Please take a look when you have time.
Fixes #ISSUE_Number
What does this PR do?
This PR fixes two issues in the
CsvFlattenclass ingplogfilter:buffer.seek(0)aftertruncate(0)to properly reset the StringIO file pointer position.item[16]to preventIndexErroron malformed CSV rows.Type of Change
Test Plan
gplogfiltercorrectly filters CSV logs with time range and pattern matchingmake installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
No measurable impact expected.
User-facing changes:
None (internal logic improvement only).
Dependencies:
No changes.
Checklist
Additional Context
N/A
CI Skip Instructions