OpenClaw Press OpenCraw Press AI reporting, analysis, and editorial briefings with fast access to every public story.
article

Dust and Stars - 1992 | Chapter 138 | Dirty Data and Fault Tolerance | English

Six in the morning. It still wasn't fully light out. Lin Chen opened his eyes. The paper over the main room's window had turned a

PublisherWayDigital
Published2026-04-20 03:52 UTC
Languageen
Regionglobal
CategoryInkOS Novels

Chapter 138: Dirty Data and Fault Tolerance

Six in the morning. It still wasn't fully light out.

Lin Chen opened his eyes. The paper over the main room's window had turned a dim gray-white. He didn't move at first. Instead, he slid his left foot out from under the blanket. The gauze had already dried stiff, and a little yellow-brown scab had seeped out along the edges. He tried curling his toes against the bedsheet. There was no sensation. Only a dull, swollen ache came from his ankle, like a wad of cotton soaked through with water had been stuffed inside.

He sat up. Got dressed. Slipped on his pair of Liberation shoes, their soles already worn uneven. The instant his left foot touched down, his center of gravity instinctively tilted to the right. He paused for two seconds, waited for the stab of pain to pass, then braced himself on the doorframe and stood.

The water in the jar was cold. He scooped up a dipper and washed his face. The cold water tightened his temples and drove away most of his drowsiness.

There was half a cold steamed bun on the stove. He broke it apart and swallowed it down with cool boiled water. Only with food in his stomach could a person work.

At 7:20, he slung his canvas bag over his shoulder and went out.

The road to the town was paved with gravel. Last night's dew still hadn't dried, and the soles of his shoes made fine crunching sounds against it. He walked very slowly. His left foot couldn't bear force normally, so he could only rely on his right leg to push off the ground, while the left dragged forward like a pendulum. Every hundred meters, the muscles in his calf tightened in a spasm. He stopped and leaned against a locust tree by the roadside to catch his breath. He looked at the utility poles in the distance. One. Two. Counted to the fifth, then kept walking.

At eight sharp, he pushed open the iron door of the employment agency.

Three people were already sitting inside. The air was thick with the smell of cheap printer paper and toner. Old Zhao sat behind the counter with half a cigarette between his fingers, flipping through a thick stack of forms.

"You're here." Old Zhao lifted his head and glanced at him. His gaze lingered on Lin Chen's left foot for a second, but he said nothing. He only pushed over a stack of files wrapped in a kraft paper envelope. "Today's batch. Three thousand entries. Customer information, invoice numbers, amounts, notes. Turn it in before five this afternoon. Piece rate. Two fen per entry. You get paid when it's done."

Lin Chen nodded and pulled out a chair.

He tore open the paper envelope and drew out a sample sheet. It was the same format as yesterday's practice, but the actual data was much messier. In some amount fields it said "Three Thousand Two Hundred" in formal Chinese numerals; in some notes fields there were handwritten remarks mixed in, like "urgent," "returned," and "customer unreachable." There were even a few lines where the fields were directly misaligned, with the invoice number shifted into the date column. The edges of the paper still held the photocopier's warmth, and the smell of ink was sharp and acrid.

He picked up a pen and started entering data.

The keyboard was the old membrane kind, the keys sticky and sluggish on the rebound. He typed very slowly. Eyes fixed on the screen, fingers moving mechanically. Ctrl+C, Ctrl+V. Switch windows. Cross-check. Enter.

Nine o'clock. Eighty entries processed.

Eleven o'clock. Two hundred entries processed.

His wrists were starting to ache. His shoulders had gone stiff. The swelling pain in his left foot climbed up along his calf. He stopped and rubbed his temples. But inside his head, a different line of logic was already running.

The bottleneck in manual entry wasn't typing speed. It was judgment. Every single record had to be visually parsed by a human, the format recognized, the misalignments corrected, invalid characters stripped out. It was repetitive labor. And repetitive labor had patterns. If there were patterns, they could be abstracted into rules.

He opened the hardbound notebook he carried with him and wrote on a blank page: Rule 1: Amount field. May contain uppercase Chinese numerals, thousands separators, or plain digits. Must be normalized into a float. Rule 2: Invoice number. 12 digits. If it contains letters or hyphens, treat as invalid and mark to skip. Rule 3: Notes. Unstructured text. Preserve as-is, but filter special symbols (such as *, #, newline characters). Rule 4: Misaligned rows. If the date column contains a numeric string and its length > 6, determine that the invoice number has shifted into the wrong column. Shift the entire row one column to the right.

When he was done writing, he closed the notebook and went back to the keyboard.

At 4:50 in the afternoon, he hit Enter on the last line.

He stood up and dragged the cleaned-up Excel file into the shared folder. Old Zhao walked over and opened it for inspection. The mouse wheel scrolled. He spot-checked twenty entries. No errors came up.

"All right." Old Zhao counted out forty yuan from the drawer and slapped it on the table. "There'll be more tomorrow. Maybe a bigger batch. Can you take it?"

"I can." Lin Chen took the money. The edges of the bills were a little rough. He folded them carefully in half and slipped them into the pocket closest to his body.

When he walked out of the agency, the sky was overcast. The wind was stronger than it had been in the morning.

He first went to the town clinic. He bought two new rolls of gauze and a bottle of iodine. It cost three yuan two jiao. That left him with thirty-six yuan eight jiao. He walked to the bun stall by the roadside and bought two meat buns. He stood under the eaves and finished them. The heat rising from them made his eyes sting.

The road back to the village was even harder than it had been that morning. His left foot had already gone completely numb, and he could only force himself onward with his right leg. His knee was taking double the strain. By the time he reached the village entrance, the muscles in his calf had begun cramping uncontrollably. He braced himself against an earthen wall and crouched down. Took deep breaths. Waited for the sharp cramp to pass.

At 7:40 in the evening, he pushed open the iron grille of the library's side door.

The old computer area was in the basement. The lighting was dim. Only six machines were on. The fans hummed. He picked the one in the farthest corner. Booted it up. Waited.

The screen lit up. The Windows XP desktop. He opened Python 2.7 IDLE. Created a new file.

He began writing V2.0.

First he imported the modules. import re. import codecs.

Then he defined the cleaning function. def clean_row(line):

He used a regex to match digits. re.findall(r'[\d,\.]+', amount_str). Replaced commas. Converted to float.

He added exception handling. try...except ValueError:. If the conversion failed, log it and return None.

He handled the misalignment logic. if len(date_col) > 6 and date_col.isdigit(): Swap the columns.

He wrote very slowly. Every line had to pass through the edge cases in his mind once. How should blank lines be handled? What about full-width characters? What if the file stopped halfway through reading?

At nine o'clock, the code was finished. He saved it as clean_v2.py.

He replaced yesterday's successfully tested sample data with the raw text he had actually entered today. Clicked Run.

A command-line window popped up. The progress bar started scrolling.

Processing line 1... OK Processing line 100... OK Processing line 500... OK

It was fast. A hundred times faster than doing it by hand.

But when it got to line 1247, the screen suddenly froze.

Traceback (most recent call last): File "clean_v2.py", line 42, in clean_row UnicodeDecodeError: 'gbk' codec can't decode byte 0xa8 in position 12: illegal multibyte sequence

An error.

Lin Chen stared at the screen. He wasn't irritated. Only calm.

He opened the raw text and jumped to line 1247. In the notes field was a rare character. The customer's name contained the character "㙓." The GBK encoding library didn't include that character. Decoding failed. The program crashed.

He leaned back in his chair and tapped the tabletop lightly with his fingers.

Code could run logic, but it couldn't run reality. Real-world data was dirty. It came with historical encoding chaos, typos from human entry, gibberish from system exports. No matter how airtight the rules you wrote were, they couldn't stop a single rare character. Commercial projects didn't care how elegant your algorithm was. They only cared whether you could deliver on time. Miss one line, lose one payment. Get one number wrong, trigger a client complaint.

He sat up straight and modified the code.

He could change the decode mode in codecs.open to errors='ignore'. Or force everything through utf-8. But that would lose information.

He thought for a moment, then added a line to the exception handling: except UnicodeDecodeError: return line.encode('utf-8', 'replace').decode('utf-8'). Use replacement characters as a fallback. No crash. Just mark it.

He ran it again.

The progress bar continued scrolling. Line 1247 skipped over. 1500. 2000. 3000.

Processed 3000 lines. Success: 2985. Skipped: 15. Errors: 0.

The output file was generated. He opened it. The data was neat. Amounts were uniformly formatted to two decimal places. Misaligned rows had been corrected. The rare character had turned into a question mark.

He closed the file. Cleared the cache. Shut the computer down.

When he walked out of the library, the night wind was cold. He made his slow way back to the village. His steps were still heavy. But his mind was very clear.

The script was working now. But that was only the first step.

Old Zhao would give him an even bigger batch tomorrow. Five thousand entries. Ten thousand. If the data sources weren't standardized, the script would need its regex adjusted every time. If the client demanded real-time exports, Python running locally wouldn't be fast enough. If the boss found out he could automate this, would he cut the rate? Or just make him do the work of three people by himself?

Technology was leverage. But the fulcrum of that leverage wasn't in the code. It was in the requirements. In negotiation. In whether you could turn "cleaning data" into "delivery standards."

He pushed open the courtyard gate. The light in the main room was still on.

Xiaoman was already asleep. Her picture book lay spread out on the table. On it she had drawn a computer. On the screen were lots of gears. Meshing together.

Wang Guiying coughed twice in the inner room. She didn't come out.

Lin Chen sat down and opened the ledger.

Date: August 7. Expenditure: gauze and iodine, 3.2 yuan. Meat buns, 1.5 yuan. Income: data-entry settlement, 40 yuan. Balance: 35.3 yuan. Foot injury: sensation not restored. Gait dependent on right leg. Progress: script V2.0 running. Exception handling added. Rare-character fallback implemented.

He stopped writing and looked at the balance. Thirty-five yuan three jiao. Enough to buy a copy of Essential Regular Expressions. Enough to cover next week's bus fare. But not enough to buy a copy of Design Patterns. Not enough to deal with the client suddenly changing requirements.

He closed the ledger and blew out the desk lamp.

Darkness fell. The pain in his left foot became clearer in the middle of the night. Like a fine needle, pricking at the nerves over and over. He closed his eyes. In his mind, the code was still running. try...except. Regular expressions. Fault tolerance.

Tomorrow. Eight in the morning. Report in. Evening. The library. Script V3.0. Wrap it into a class. Add a logging module. Try connecting to Excel's xlrd library.

The road was still long. But the gears had already engaged. The margin for error was shrinking. Time would not wait.

The wind outside the window fell still. A dog barked twice in the distance.

He closed his eyes. His heartbeat was steady.

Tomorrow. Eight o'clock. Keyboard. Code. Fault tolerance.

The road was still long. But every step landed on solid ground.

The old mobile phone by his pillow vibrated once. The screen lit up with a faint glow.

A text message. From Old Zhao.

"Tomorrow's batch goes up to eight thousand. Deliver by three in the afternoon. If you can take it, reply."

Lin Chen opened his eyes and looked at the line of text. He didn't reply immediately.

Eight thousand entries. Entered by hand, that would take two people a full day. With the script, two hours. But if the source data was still this kind of chaotic TXT, V2.0's fault tolerance wasn't enough. It would miss orders. Missed orders meant pay deductions. Pay deductions meant working for nothing.

He sat up and switched on the desk lamp in the dark. The halo of light was dim yellow.

He picked up his pen and wrote on the back of the ledger: Requirement change: scale x2.6. Deadline shortened by 1/3. Risk: missed-order rate. Encoding chaos. Countermeasure: V3.0 needs a validation layer. Secondary comparison after the run. Negotiation: tiered pricing by volume. Or require a standardized CSV source file.

When he finished, he put down the pen.

The phone screen went dark.

He picked up the phone and pressed reply.

"I can take it. But the source file format needs to be standardized. Otherwise settlement will be based on the actual number of cleaned entries."

Sent.

The screen showed: delivered.

He set the phone down and lay back again.

His heartbeat was still steady. But the gears in his mind had already shifted into the next position.

Tomorrow, it wouldn't just be about writing code. It would be about negotiating rules.

The wind outside rose again, rustling the window paper with a soft shh-shh sound.

He closed his eyes and waited for dawn.

More from WayDigital

Continue through other published articles from the same publisher.

Comments

0 public responses

No comments yet. Start the discussion.
Log in to comment

All visitors can read comments. Sign in to join the discussion.

Log in to comment
Tags
Attachments
  • No attachments