Universität zu LübeckThesis System

AI‑Enhanced FMaaS for Copilot

Bachelor / Master
ai
se

AI‑Enhanced Formal‑Methods‑as‑a‑Service: Designing, Implementing, and Evaluating an MCP‑Based Backend that Powers a VS Code Copilot Extension for Automated Program Analysis and Verification.

1. Motivation & Vision

Formal methods (model checking, static analysis, theorem proving) provide mathematically‑rigorous guarantees about software, but they remain isolated, heavyweight tools that are rarely used inside everyday IDEs.
Recent advances in large‑language models (LLMs) and generative AI (e.g., GitHub Copilot, ChatGPT) have shown that AI can guide developers, suggest code, and even write specifications. 

The vision of this thesis is a single, AI‑augmented, cloud‑hosted service (an MCP – “Multi‑Component Platform”) that:

  1. Wraps a portfolio of open‑source verification tools (e.g., SPIN, CBMC, Frama‑C, Why3, Z3). 
  2. Enriches the raw tool output with AI‑driven explanations, bug‑localisation, and repair suggestions. 
  3. Exposes the whole capability through a lightweight VS Code extension that behaves like a Copilot “assistant” – the user types, the assistant offers formal‑method insights in real time.

If successful, the system would demonstrate that AI can make formal methods practical for everyday programmingwhile preserving their soundness.


2. Research Questions

#Question
RQ1How can AI be combined with traditional formal‑method tools to produce human‑readable diagnostics and repair suggestions without sacrificing soundness?
RQ2What is the most extensible architecture for hosting multiple verification back‑ends (MCP) while supporting low‑latency, on‑demand queries from an IDE?
RQ3Does the AI‑augmented Copilot assistant improve developer productivity and confidence compared to using the raw tools or plain Copilot alone?
RQ4What are the security and privacy trade‑offs of running code analyses on a remote server, and how can they be mitigated?

3. Scope & Deliverables

DeliverableDescription
D1 – Conceptual ArchitectureA detailed design (UML, data‑flow) of the MCP server, AI‑enhancement pipeline, and VS Code extension.
D2 – MCP Server ImplementationContainerised micro‑services for at least three verification tools (e.g., CBMC for C, SPIN for Promela, Z3 via Why3 for functional languages).
D3 – AI‑Assistance LayerFine‑tuned LLM (or prompt‑engineering on a hosted model) that parses tool output, generates natural‑language explanations, and proposes code patches.
D4 – VS Code Copilot‑Style ExtensionAn extension that intercepts the user’s cursor context, forwards a request to the MCP, and renders AI‑augmented suggestions inline (similar to Copilot).
D5 – Evaluation SuiteBenchmarks (real‑world open‑source projects), quantitative metrics (latency, precision/recall of bug detection, suggestion acceptance rate), and a small user study (≈10‑15 participants).
D6 – Thesis DocumentFull academic write‑up (≈80 pages) covering background, methodology, results, discussion, and future work.

4. Methodology

4.1. Background Study (Weeks 1‑4)

  • Survey formal‑methods tools and their APIs. 
  • Review state‑of‑the‑art AI‑assisted programming assistants (Copilot, Tabnine, CodeBERT). 
  • Identify gaps where AI can add value (explanation, repair, prioritisation).

4.2. Concept & Architecture (Weeks 5‑8)

  1. MCP Core – REST‑/gRPC‑based gateway that receives JSON requests of the form: jsonZuklappen Kopieren1{ “language”: “c”, “file”: “…”, “cursor”: 123, “action”: “verify” }
  2. Tool Plugins – Docker‑based micro‑services wrapping each verifier. Each plugin:
    • Accepts source code and optional specifications. 
    • Returns machine‑readable results (e.g., JSON of counterexample traces).
  3. AI Layer – A prompt‑engine that sends verifier output to an LLM (OpenAI, Azure AI, or a locally hosted LLaMA). The prompt instructs the LLM to:
    • Summarise the bug in natural language. 
    • Point to the offending line(s). 
    • Propose a minimal code change (patch).
  4. VS Code Extension – Built with TypeScript, using the VS Code Language Server Protocol (LSP). The extension:
    • Sends a “verification request” when the user saves or invokes a command. 
    • Receives AI‑augmented suggestions and displays them as Copilot‑style inline completions or as diagnostics.

4.3. Implementation (Weeks 9‑20)

PhaseTasksExpected Outcome
MCP server• Scaffold FastAPI (Python) or NestJS (Node) gateway.
• Containerise CBMC, SPIN, Why3+Z3.
• Implement request routing & authentication.
A functional HTTP endpoint that can analyse a snippet with any enabled tool.
AI layer• Choose an LLM (GPT‑4‑Turbo, LLaMA‑2‑13B).
• Build prompt template, test on sample verifier outputs.
• Implement caching & rate‑limiting.
A service that returns JSON {explanation, patch, confidence}.
VS Code extension• Set up scaffolding with Yeoman.
• Implement LSP client that talks to MCP.
• Render suggestions as CompletionItems and as Diagnostics.
A usable extension that shows AI‑augmented verification hints.
Integration Tests• End‑to‑end scripts that compile a project, send to MCP, retrieve suggestion, and auto‑apply patch in a sandbox.Confidence that the pipeline works on real code bases.

4.4. Evaluation (Weeks 21‑28)

  1. Quantitative Benchmarks
    • Dataset: 30 open‑source C/C++ projects (e.g., from the SAMATE test suite, GitHub). 
    • Metrics:
      • Detection Rate (true positives / all known bugs). 
      • Precision of AI explanations (human rating on a 5‑point Likert scale). 
      • Patch Correctness (does the suggested fix compile & pass tests?). 
      • Latency (average round‑trip time < 2 s).
  2. User Study
    • Recruit 10‑15 master students / junior developers. 
    • Two conditions: (a) “Plain Copilot only”, (b) “Copilot + AI‑formal‑method assistant”. 
    • Tasks: fix injected bugs, add missing assertions. 
    • Capture time‑to‑completionsubjective confidence, and acceptance rate of AI suggestions.
  3. Security & Privacy Review
    • Threat modelling (code leakage, malicious payloads). 
    • Implement optional client‑side encryption & policy‑based code sanitisation. 
    • Evaluate overhead and impact on latency.

4.5. Writing & Dissemination (Weeks 29‑34)

  • Draft each thesis chapter (introduction → related work → methodology → results → discussion). 
  • Prepare a 15‑minute conference‑style presentation. 
  • Publish the source code (MIT license) on GitHub and write a short technical blog post.

5. Expected Contributions

AcademicPractical
C1 – A novel AI‑enhanced formal‑methods‑as‑a‑servicearchitecture that cleanly separates verification kernels from AI post‑processing.P1 – An open‑source VS Code extension that brings formal‑method feedback directly into the developer’s coding workflow, mimicking Copilot’s UI.
C2 – Empirical evidence (benchmark + user study) on the usefulness of AI‑generated explanations & patches for formal‑method results.P2 – A reusable Docker‑based MCP framework that can be extended with additional analysis tools (e.g., symbolic execution, abstract interpretation).
C3 – Guidelines for secure, privacy‑respecting remote program analysis services.P3 – A prompt‑engineering recipe for turning raw verifier output into actionable natural‑language suggestions.

6. Technologies & Toolchain

LayerRecommended Tools
Server (MCP)• FastAPI (Python) or NestJS (Node) for REST/gRPC gateway.
• Docker Compose for micro‑service orchestration.
• NGINX as reverse proxy & TLS terminator.
Verification Tools• CBMC (bounded model checking for C).
• SPIN (model checking for Promela).
• Why3 + Z3 (SMT‑based verification for functional languages).
AI Layer• OpenAI API (GPT‑4‑Turbo) or locally hosted LLaMA‑2‑13B with vLLM.
• Prompt templates stored in a JSON/YAML config.
IDE Extension• VS Code Extension API (TypeScript).
• Language Server Protocol for diagnostics.
• GitHub Copilot SDK (preview) if available, otherwise mimic inline completions.
Testing & CI• GitHub Actions (build, test, Docker push).
• pytest / Jest for unit tests.
Evaluation• BenchExec for reproducible tool benchmarking.
• Qualtrics / Google Forms for user‑study questionnaires.

7. Risks & Mitigation Strategies

RiskImpactMitigation
R1 – LLM hallucinations(incorrect explanations or patches).Low trust, false sense of safety.Keep soundness in the back‑end: only present AI suggestions as advice; provide a “View raw verifier output” button.
R2 – High latency due to remote verification or LLM calls.Poor UX, users may disable the extension.Cache results per file version; run lightweight static analysis locally and fall back to server only for heavy checks.
R3 – Code‑privacy concernswhen uploading proprietary code.Legal/ethical obstacles.Offer a self‑hosted mode (Docker compose) where the whole stack runs behind the user’s firewall; provide clear data‑handling policy.
R4 – Integration complexity(different tool output formats).Development delays.Define a common JSON schema for all tool plugins; write adapters early and use automated tests for each plugin.
R5 – Limited evaluation participants (hard to recruit).Threat to validity of RQ3.Leverage university courses (e.g., a programming lab) where participation can be incentivised with extra credit.

8. Proposed Timeline (30 weeks)

WeekMilestone
1‑4Literature review, define research questions, finalize tool selection.
5‑8Architecture design (UML, data flows), write first chapter.
9‑12Implement MCP gateway and Dockerised verification plugins.
13‑16Build AI prompt layer; iterate on explanation/patch quality.
17‑20Develop VS Code extension, integrate with MCP, initial end‑to‑end tests.
21‑24Benchmark suite, collect quantitative metrics.
25‑27Conduct user study, analyse results, write evaluation chapter.
28‑30Security/privacy analysis, final writing, proofreading, submission.

9. Why This Thesis Is Feasible

  • Existing building blocks – All verification tools and AI APIs are publicly available and can be containerised. 
  • Clear scope – Limiting the MCP to three representative tools keeps the implementation manageable while still showcasing heterogeneity. 
  • Incremental integration – The VS Code extension can be built step‑by‑step (first static diagnostics, then AI‑augmented suggestions). 
  • Measurable outcomes – Both quantitative (precision/recall, latency) and qualitative (user confidence) metrics are well‑defined, allowing a robust evaluation. 
  • Academic relevance – The work sits at the intersection of formal methodsAI‑assisted programming, and software‑engineering tools, a hot topic for conferences such as ASE, ICSE, CAV, ESEC/FSE.

10. Take‑away Summary

  • Concept: An AI‑enhanced, cloud‑hosted verification platform (MCP) that packages multiple formal‑method tools, enriches their output with LLM‑generated explanations & repairs, and delivers the result directly in VS Code via a Copilot‑style extension. 
  • Implementation: Dockerised micro‑services for CBMC, SPIN, Why3/Z3; a FastAPI gateway; an LLM prompt‑engine; a TypeScript VS Code extension using LSP. 
  • Evaluation: Benchmark on 30 open‑source projects + a controlled user study; measure detection rates, explanation quality, patch correctness, latency, and user productivity. 
  • Outcome: A reusable research artefact, a peer‑reviewable thesis, and evidence that AI can make formal methods approachable without compromising their rigor.

If you adopt this topic, you will produce a concrete, open‑source system that bridges the gap between theory (formal verification) and practice (AI‑driven developer assistance)—a contribution that is both academically publishable and directly useful to software engineers.

Related Thesis Topics

Hardware‑Backed MCDC Coverage Framework

Redirecting Clang’s MCDC Coverage Instrumentation to ARM CoreSight: A High‑Performance, Hardware‑Backed Coverage Collection Framework

Formalising Trustworthy AI

Mapping the European AI HLEG Trustworthiness Framework to Formal Methods

FPGA‑Based Quantum‑Computer Simulator with a Web‑Interface

FPGA‑Based Quantum‑Computer Simulator with a Web‑Based Design‑Run‑Analyse Front‑End

Visuelle Defekterkennung

In Kooperation mit Coherent in Lübeck Warum das spannend ist Kontext & Ziel Heute wird Qualitätsprüfung bei Coherent in Lübeck noch manuell erledigt. Kommerzielle Tools gibt’s, aber ein erklärbarer In‑house‑Prototyp ermöglicht schnelleres Lernen, Kostenkontrolle und höhere Akzeptanz bei den Nutzer*innen. Ziel der Arbeit ist ein produktnaher Prototyp, der robust, nachvollziehbar und betreibbar ist. Leitfragen Datensatz […]

Predictive Maintenance aus hochdimensionaler Laser‑Telemetrie

in Kooperation mit Coherent Lübeck Warum das spannend ist Kontext & Motivation Industrielaser erzeugen umfangreiche Telemetrie (Temperaturen, Ströme, Leistungen, Interlocks, Fehlercodes). Historische Felddaten und Burn‑in‑Tests sollen genutzt werden, um Ausfälle rechtzeitig vorherzusagen, Ursachen schneller einzugrenzen und Wartung präventiv zu planen. Leitfragen Datensatz Mögliche Aufgaben Ergebnisse

AI-based 3D Volume Reconstruction of Ultrasound Images

This project aims to develop a machine learning model capable of reconstructing a 3D volume using the ultrasound images of a forearm, tracked probe data, and the calibration matrix as inputs. The model should be able to compensate for inaccuracies in the calibration matrix, thereby improving the accuracy of the reconstructed 3D volume. What you […]

Electromagnetic Tracking of Pasture Fences

Maintenance of electric fences is essential to keep animals secure in pasture fields. For any automatic maintenance procedure, the location of the fence needs to be known. Techniques like GNSS or visual detection come with drawbacks that make them unapplicable for the real application, as the accuracy and reliablity are not high enough to precisely […]