The Institute for the Protection of Terrestrial Infrastructures is dedicated to enhance the resilience of critical infrastructure. Digital Twins enable monitoring and response to crises and attacks, as well as analysis and optimization of the resilience of these infrastructures.
What to expect
The Institute for the Protection of Terrestrial Infrastructures aims to ensure the stable supply of society. Digital twins enable monitoring and response to crises and attacks, as well as analysis and optimization of the resilience of terrestrial infrastructures. A major bottleneck is the conversion of engineering documentation, especially Piping & Instrumentation Diagrams (P&IDs) and electrical schematics, into structured, machine readable data. In this project you will research, prototype and evaluate methods that leverage textual annotations (labels, notes, part numbers) by combining large language model (LLM) features with object detectors used to predict graphs.
Your tasks
- implement state of the art OCR on P&IDs and electrical drawings and build a curated (synthetic + real) dataset for training and evaluation.
- create a semi automatic annotation tool to tag text strings and their associated graphical symbols, producing ground truth relation tables (text ↔ object).
- compare and evaluate several approaches to fuse the text features with symbol detection:
- design rule based methods that use geometric proximity, alignment and domain specific cues (e.g., “text placed above a symbol usually describes it”).
- extract semantic embeddings from the detected text with a pre trained LLM and combine them with CNN features of the symbols
- build a classifier that receives text features + object features as input and predicts a binary “related / not related” output.
- documentation, scientific analysis, and presentation of the investigation results
Your profile
- ongoing studies in Computer Science, Mechatronics, Electrical Engineering, or a related field
- willingness to work on complex topics
- basic knowledge in the field of Machine Learning
- good knowledge of programming with Python
- ability to work independently and good communication and team skills
- proficiency in English (written and spoken) for documentation and presentations
Desired Qualifications:
- Experience with computer vision libraries such as OpenCV, PyTorch or TensorFlow
- Knowledge of LLMs
- Experience in scientific writing or presenting research results
We look forward to getting to know you!
If you have any questions about this position (Vacancy-ID 4254) please contact:
Tobias Koch
Tel.: +49 2241 20148 55