Wondering what is R-NER? Possibly a jargon to attract your attention, right? But it’s more than just a buzzword; it’s a practical solution.
Our team in Rosterly had been working to develop an efficient algorithm to tackle the long-hauling challenge of manual timesheet entry.
If you're in the IT consulting industry, this problem will sound familiar. Finance teams often spend hours manually enteringemployee timesheet information, just to create invoices and send them to customers.
That's why a solution that handles any timesheet type and converts it into a machine-readable format can greatly reduce manual entry time. Now you may wonder why this is a challenging problem.
OCR (Optical Character Recognition) tools have existed since the 1950s, and the field of automation is quite mature.
However, the challenge isn’t simply in extracting data from timesheet documents—it’s in categorizing and annotating key fields like hours worked each day, off days, employee names, and the timesheet start and end periods.
The difficulty lies in creating a solution that can generalize across different consulting firms and their varied timesheet formats.
Hence a proposed solution should be Intelligent enough to identify the above-mentioned parameters practically from any kinds of timesheets.
This brings us to the highlight of today’s discussion: R-NER, Rosterly Named Entity Recognizer, a well-tested framework designed to accurately extract all critical information from timesheets.
Beyond OCR: How R-NER Acquires Intelligence
Modern computer vision has given us powerful Deep Neural Networks(DNN) for extracting text from documents.
We started with PaddleOCR, a sophisticated framework that not only extracts text but also understands where information is positioned on the page.
This positional awareness is crucial when dealing with diverse timesheet formats. But extracting text is just the beginning.
To truly understand what the text means, we integrated advanced language models like BERT and RoBERTa. These AI models are particularly good at understanding context – they can tell whether "8:00" refers to start time, end time, or break time based on surrounding information.
By fine-tuning these models on timesheet-specific data, we achieved about 60% accuracy in parsing timesheets correctly. However, 60% accuracy meant we still needed human intervention 40% of the time – not good enough!
We needed our system to think more like a human operator. The breakthrough came when we integrated Bidirectional LSTM (BiLSTM) technology, which adds a crucial reasoning layer to our system. Unlike simpler models that just label data, BiLSTM can "think forward and backward" through the information, catching inconsistencies and ensuring the extracted data makes logical sense.
Evolution to R-NER v2: Tackling Structural Challenges
While BiLSTM brought significant improvements, we encountered a new challenge: handling the complex structure of timesheet data. Timesheets typically contain both structured elements (like tables and grids) and semi-structured data (such as mixed text, numbers, and special characters).
Traditional LSTMs, designed primarily for sequential text processing, struggled with these varied formats. The next breakthrough in our journey came with the integration of Large Language Models (LLMs) and Chain of Thought (CoT) prompting.
LLMs bring human-like comprehension to the task, while CoT prompting enables the system to break down complex parsing decisions into logical steps – much like how a human processor would approach a new timesheet format.
R-NER v2: A Quantum Leap in Accuracy
The combination of these advanced technologies has transformed R-NER into a truly intelligent system. Let's look at how our journey progressed through various technological iterations:
With these progressive improvements, R-NER v2 is now capable of:
- Understanding complex table structures and relationships
- Processing mixed data formats with contextual awareness
- Applying sophisticated reasoning to ambiguous entries
- Maintaining consistency across different timesheet sections
- Adapting to new timesheet formats without additional training
The results speak for themselves: R-NER v2 now achieves an impressive 85% accuracy in automated timesheet processing. But what does this mean for employees and organizations? Let's look at the concrete improvements:
Employee Timesheet Submission: Manual vs R-NER
As the metrics demonstrate, R-NER transforms the timesheet submission experience:
- Submission Time: Reduced from 15-20 minutes to 30 seconds per timesheet
- Data Entry Errors: 90% reduction in common input errors through automated extraction
- Employee Time Spent: Minimal time investment - just verify and submit
- On-time Submission: 40% improvement in timely submissions due to simplified process
For employees, this means:
- No more manual data entry - just upload your existing timesheet
- Immediate feedback on successf
- ul processing
- Significantly less time spent on administrative tasks
- Flexibility to use their preferred timesheet format
For organizations, the benefits include:
- Faster billing cycles through prompt timesheet submissions
- Higher data accuracy for better project time tracking
- Improved employee satisfaction with administrative processes
- Reduced back-and-forth between employees and finance teams
This self-service approach means employees can complete their timesheet submissions in minutes rather than hours, with significantly reduced error rates and minimal intervention needed..
References
PaddleOCR: Multi-language and Efficient OCR Toolkits Based on PaddlePaddle
GitHub Repository. Available at: https://github.com/PaddlePaddle/PaddleOCR
Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
Available at: https://arxiv.org/abs/1810.04805
Liu, Y., et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Available at: https://arxiv.org/abs/1907.11692
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.
Available at: https://www.bioinf.jku.at/publications/older/2604.pdf
Graves, A., & Schmidhuber, J. (2005). Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures.
Neural Networks, 18(5-6), 602-610.
Available at: https://www.sciencedirect.com/science/article/abs/pii/S0893608005001206
Wei, J., et al. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models.
Available at: https://arxiv.org/abs/2201.11903