Dawei's Homepage

Developing with MCP: A Developer's Perspective

· Dawei Gao

As a developer working on AgentScope, a multi-agent platform. I have been integrating Model Context Protocol (MCP) servers into our library. Though impressed by the simple and elegant idea behind MCP, I encountered some practical challenges and I thought it valuable to share my experience.

My Journey with MCP

Start with a Simple Server

My journey started with a simple map service MCP server (like Google Maps) with the following basic functions:

function arguments description
text_around_search keywords, location, radius Search POIs around a given location with a specified radius
maps_text_search keywords, city Search POIs by the given keywords
maps_geo address, city Search location by the given structured address

The test query is to find the nearest cafe around a specific location in Hangzhou, China. Easy peasy, or so I thought.

Find the nearest cafe around Ali Cloud Valley

Here’s how my debugging journey went:

  • First Try: I started with a ReAct agent using a simple system prompt as follows. When given the query, LLM confidently used text_around_search with coordinates from its prior knowledge, which is incorrect of course.
You're a helpful assistant named Friday.

# Target
Your target is to finish the given task with the provided tools.
  • Second Try: Okay, let’s be more specific by adding explicit instructions about coordinate handling.
You're a helpful assistant named Friday.

# Target
Your target is to finish the given task with the provided tools.

# Note
1. DON'T make any assumptions! All the coordinates you use should be obtained from the `map_geo`

Better! The LLM now used maps_geo first with “Ali Cloud Valley” as input. But still failed because maps_geo needs a structured address, and “Ali Cloud Valley” isn’t exactly that. This highlighted the need for an intermediate step: using maps_text_search to get the formatted address before getting coordinates.

  • Final Solution: After identifying the key issues, I added more specific guidance in the system prompt:
You're a helpful assistant named Friday.

# Target
Your target is to finish the given task with the provided tools.

# Note
1. DON'T make any assumptions! All the coordinates you use should be obtained from the `map_geo`
2. The input locations maybe not specific, so you MUST first use `maps_text_search` to get the complete and accurate address. After that, you can use `maps_geo` to get the coordinate.
3. Sometimes, there maybe multiple locations for the same name, once you feel the search result is not accurate, you should research with different keywords or use `generate_response` function to ask for more information

This process reminded me of RAG (Retrieval-Augmented Generation)’s standard operating procedure:

Unstructured query ==> Query rewrite ==> Structured query ==> Retrieve.

It’s unrealistic for most current LLMs to learn this SOP and execute it correctly and stable without explicit guidance.

However, even with this optimized prompt, I encountered new accuracy issues. Want to find sth within 500 meters? Well, the error range of some coordinate from maps_geo can be more than 1km. You may finally get a cafe that’s 1.5km away, and you cannot know that in any documentation.

While many articles portray MCP as a “USB interface” for agents - suggesting a simple plug-and-play solution - the reality is more complex. Developers must still invest significant effort in debugging, understanding undocumented limitations, and implementing workarounds. In fact, for simpler functionalities, implementing direct API calls might be more straightforward than dealing with MCP’s additional complexity.

What about More Servers?

So far, I have built an agent with one MCP server, but what about more? The system prompt will be overwhelmed with too many instructions.

The first idea comes to me is “multi-agent”, which is actually what I’m working on. For example, each agent holds one server and a routing agent is responsible for decomposing the task and routing sub-tasks to the corresponding agent.

However, this is not a good solution. I cannot guarantee that all sub-tasks can be completed with only one MCP server. Handling the communication between different agents flexibly and elegantly is also a challenge.

Here is my solution: allow the agent to reset its MCP servers according to the current task. When resetting the MCP servers, the corresponding tools will be equipped, and the system prompt will be updated by adding/deleting corresponding tools instructions. By this way, the agent is more flexible and powerful, and can serve as a basic agent in a more complex multi-agent system.

Specifically, the reset_equipped_tools function receives a boolean argument for each MCP server. If the argument is set to True, the tools in this MCP will be added to the agent, as well as the corresponding instructions.

The following is the signature of the reset_equipped_tools:

class ReActAgentV2:
    # ...
    def reset_equipped_tools(
        self,
        gaode_map: bool = False,        # Map service server
        webpage_fetch: bool = False,    # Webpage fetching server
        webpage_deploy: bool = False,   # Webpage deployment server
        multimodal: bool = False,       # Multimodal server (text to image/audio)
    ) -> ServiceResponse:
        """Reset your equipped tool functions. Use this function when your 
        current tools are not enough to solve the task. Note this function is 
        not incremental, it will first disable all the current tools, and then 
        enable the new ones.
    
        Args:
            gaode_map (`bool`, defaults to `False`):
                The gaode map related tools for geography related tasks
            webpage_fetch (`bool`, defaults to `False`):
                The webpage fetching tools
            webpage_deploy (`bool`, defaults to `False`):
                The webpage deploying tools, which can be used to deploy the given webpage
            multimodal (`bool`, defaults to `False`):
                The multimodal tools, including text to image/audio tools
        """
        ...

Reflections and Concerns

I’m not saying MCP is bad. It’s a great idea and has a lot of potential, but as an agent developer, integrating MCP into my code is quite challenging.

MCP Selection

The challenge starts with selecting the right MCP server. Check out one MCP market, you will find

  • Proliferation of similar servers with unclear differentiation
  • Individual servers offering multiple overlapping functions without documentation regarding function selection criteria (e.g. three functions that fetch webpage in Markdown, HTML and JSON format respectively)

While flexibility is valuable, clearer guidelines on function selection and use cases would significantly improve developer experience.

Quality Control & Evaluation

The lack of standardized quality metrics is a significant concern for both agent and MCP server developers. Currently, there’s no reliable way to:

  • Evaluate which server best suits specific needs and compare performance across similar servers (for agent developers)
  • Determine if a server meets minimum quality standards (for MCP developers)

I know it’s difficult to establish a universal standard for all MCP servers, but building a way for agent and MCP developers to communicate and share feedback would be a great start. Similar to mobile app stores where user feedback drives improvements and helps maintain quality standards.

Post-Processing Complexity

When using multiple MCP servers, the inconsistent return types and formats leads to a cumbersome post-processing experience. To render the results from MCP servers in a frontend, I have to post-process their results one-by-one. Markdown, JSON, HTML, and plain text formats are all mixed together, making it a nightmare to handle.

At the end, all these issues significantly undermine the “plug-and-play” promise of MCP, often requiring substantial additional development effort to achieve consistent functionality.