Python-FastAPI asynchronous framework development blog system (3)-asynchronous features

Posted Jun 16, 20209 min read

Project address

Blog address

The asynchronous article is closest to Frodo's original intention. The idea of ​​using traditional frameworks for communication and data content is the same. The asynchronous approach only changed the implementation of several scenarios.

Asynchronous programming is not a new concept, but he does not specify very clear technical characteristics and routes. The related concepts are not very clear, and very few articles can explain in detail blocking/non-blocking, asynchronous/synchronous, parallel/concurrent, distributed, IO multiplexing, coroutines . These concepts may be designed in the CS professional OS and distributed system courses, but the specific implementation level may rarely be involved. Specific to the language of Python, I have read many articles written by workers in the industrial world and Python(or called Pythonistas). The following two are the most worth reading:

Xiaobai's asyncio:principle, source code to realization(1)-article after chatting-Zhihu ; Of course, the title is the author is self-effacing. The author of this article combines the asyncio standard source code in CPython, the source code of the function stack frame and the python function context source code to explain the design principle of python asynchronous, and handwritten a simple version of the event loop and asyncio-future object.

In-depth understanding of asynchronous programming in Python(Part 1) ; this article was written in 2017, when asyncio was not yet a standard library. This article uses the epoll interface of python and linux to realize single-thread asynchronous IO step by step, and finally leads to the asyncio event loop, which confirms its convenience. The author plans to have a second and middle story about the principle of asyncio, but I haven't waited until the following. The author's repository of article code has accumulated dozens of reminder issues.

fundamental issue

Remember the timing diagram we drew in "Communication"? It is no problem to use it to represent the logic of a user's execution, but in actual implementation, can we really write code like this? There are two basic problems here:

  • Concurrent access problem, how to enable multiple people to access your blog web process at the same time?
  • How to avoid io blocking and make full use of CPU time slice?

The first problem is familiar with web development, and his solutions are many, because this is a problem that must be faced in software development:

  • os level, io multiplexing mechanism, mature epoll mechanism for Linux, nginx is based on this to achieve concurrent access.
  • The programming language is solved by multi-threading. Taking Flask ​​as an example, local threads are used to solve thread safety issues.
  • The programming language is solved by asynchronous programming. Taking nodejs as an example, promise + callback method. Python is the asynchronous ecosystem represented by asyncio.

The second question is actually the same as the first question, just change the object to cpu. Frodo solves the first problem using a uvloop loop similar to the asyncio event loop. It is packaged as an opportunityASGI protocol web server uvicorn, he can start multiple apps written in the ASGI standard , Built-in a set of event loops to achieve concurrent access.

uvicorn main:app --reload --host 0.0.0.0 --port 8001

The point is that Frodo's solution to the second problem is reflected in the details of the procedure.

Problem analysis:where is there IO blocking

We take the communication logic of CRUD in the "Communications" as an example. We first mark the place where IO is blocked, and then correspond to the link in the program design, and then think about how to solve it in the implementation.

Three types of io scenes are marked in the figure, and some are serial requirements, and some are concurrent(can be concurrent) requirements. Let me explain separately:

  • _ The first category:_ Network connection and disconnection, http is a reliable transmission protocol based on tcp, and the process of establishing a connection is also a time-consuming io operation. The connection of the database is a network connection or a link to read and write socket files, which is also time-consuming. These codes are mainly in the checkpoin function in the web, in the views directory of Frodo.
  • _ The second type:_ Communication asynchronous refers to the process of the client sending a request and waiting for the data to be ready to return. This part of the waiting time is actually the back-end data io operation, and the CPU should not be occupied by this time. This part of the code is under Fdodo mdoels`.
  • _ The third category:_ Data asynchronous refers to the time consumption required to wait for the data to return with the database operation. This part of the time should also be returned to the CPU.

Many of the above scenarios must be done serially, such as establishing a database connection --> data operations --> disconnecting. There are also some scenarios(mainly those that do not involve data consistency) that can be parallel, such as cache update and deletion, because the KV database does not involve the simultaneous establishment of relationships and can be deleted in parallel.

solution

First category:time-consuming connection

When connecting and exiting the database, you will want to use the connection pool with the with keyword. Asynchronous In order to be able to "wait" or hand over execution rights to the main program for this connection process, you need to use the async keyword to wrap it , And implement the asynchronous context methods __aenter__, __aexit__.

import databases

class AioDataBase():
    async def __aenter__(self):
        db = databases.Database(DB_URL.replace('+pymysql',''))
        await db.connect()
        self.db = db
        return db

    async def __aexit__(self, exc_type, exc, tb):
        if exc:
            traceback.print_exc()
        await self.db.disconnect()

In fact, aiomysql has helped us achieve a similar function, but unfortunately aiomysql cannot be used with sqlalchemy, database is a simple asynchronous database driver engine that can execute the sql generated by sqlalchemy .

Second category:communication takes time

The asynchronous intuition of this point determines the response speed of the web application. The checkpoint function under asynchronous is itself a coroutine with the keyword “async def, which is then scheduled by uvloop. The requirement for such functions is to use await` for blocking operations, see an example:

@app.post('/auth')
async def login(req:Request, username:str=Form(...), password:str=Form(...)):
    user_auth:schemas.User = \
            ## Functions involving IO need to wait
            await user.authenticate_user(username, password)
    if not user_auth:
        raise HTTPException(status_code=400,
                            detail='Incorrect User Auth.')
    access_token_expires = timedelta(
        minutes=int(config.ACCESS_TOKEN_EXPIRE_MINUTES)
   )
    access_token = await user.create_access_token(
                        data={'sub':user_auth.name},
                        expires_delta=access_token_expires)
    return {...}

async def authenticate_user(
        username:str, password:str) -> schemas.User:
    user = await User.async_first(name=username)
    user = schemas.UserAuth(**user)
    if not user:return False
    if not verify_password(password, user.password):return False
    return user

You may have noticed that some functions such as verify_password did not wait for him because he is a computing task and cannot be waited for. We just need to wait for the io time-consuming operation according to the logic.

The third category:data manipulation takes time

This is reflected in the design of the asynchronous ORM method. The implementation example of database + sqlalchemy is as follows:

@classmethod
async def asave(cls, *args, **kwargs):
    ''' update'''
    table = cls.__table__
    id = kwargs.pop('id')
    async with AioDataBase() as db:
        query = table.update().\
                        where(table.c.id==id).\
                        values(**kwargs)
        ## Wait 1:Execute SQL statement
        rv = await db.execute(query=query)
    ## Wait 2:Take the data to construct the object
    obj = cls(**(await cls.async_first(id=id)))
    ## Wait 3:Clear the cache involved in the object
    await cls.__flush__(obj)
    return rv

Take updating data as an example, which involves waiting. Synchronous ORM frameworks like pymysql cannot be waited for in methods like db.execute(...), they are directly blocked, and the asynchronous writing method has to wait for his results, which brings benefits It is the waiting time for the execution right to be returned to the main program so that it can handle other transactions.

Parallel implementation

Parallel in asynchronous means that many io operations do not involve data consistency and can be processed in parallel, such as deleting unrelated data, querying certain data, updating unrelated data, etc., all of which can be parallel. These parallelisms are also allowed in asynchronous, with the help of the asycio.gather(*coros) method. This method puts the coroutines passed into the event loop queue and performs operations similar to coro.send(None) one by one, because The coroutine immediately quits, so all coroutines can be awakened and waited "simultaneously" to achieve parallel effect.

Tricks used in class design

The content of this section is some tips in using python asynchronous, which can help us achieve better design.

Serialize the @property property of the class

Serializing objects is common, especially if you want to store objects in the cache. Some properties of the object are done asynchronously with @property. Unlike other properties, they require special calls:

class Post(BaseModel):
    ...
    @property
    async def html_content(self):
        content = await self.content
        if not content:
            return''
        return markdown(content)

Some of this property is asynchronous. Every time you use this property, you need content = await post.html_content, and you can directly access content = post.html_content without the properties of async and await.

This brings trouble to our serialization method. We want the class to have a function that knows which asynchronous properties it has, so that it can implement a unified serialization method in BaseModel(it is unrealistic to implement serialization methods in subclasses separately).

Let the class attach a property of partials and store the property that needs to be waited for. For python, controlling the behavior of the class(note that the creation behavior of the class, not the creation behavior of the instance) needs to change its metaclass, we design a called The metaclass of PropertyHolder allows his behavior to control the generation of all data classes:

class PropertyHolder(type):
    """
    We want to make our class with som useful properties
    and filter the private properties.
    """
    def __new__(cls, name, bases, attrs):
        new_cls = type.__new__(cls, name, bases, attrs)
        new_cls.property_fields = []

        for attr in list(attrs) + sum([list(vars(base))
                                       for base in bases], []):
            if attr.startswith('_') or attr in IGNORE_ATTRS:
                continue
            if isinstance(getattr(new_cls, attr), property):
                new_cls.property_fields.append(attr)
        return new_cls

His function is to filter out the @property we need and pay directly to the properties of the class.

The next step is to change the generated metaclass of BaseModel:

@as_declarative()
class Base():
    __name__:str
    @declared_attr
    def __tablename__(cls) -> str:
        return cls.__name__.lower()

    @property
    def url(self):
        return f'/{self.__class__.__name__.lower()}/{self.id}/'

    @property
    def canonical_url(self):
        pass

class ModelMeta(Base.__class__, PropertyHolder):
    ...


class BaseModel(Base, metaclass=ModelMeta):
    ...

Base is the base class of ORM, and its own metaclass has also been changed(meaning it is not a type). If we change it directly, our data type will lose the function of ORM. The best way is to create a new class at the same time. Inherit Base and PropertyHolder to make this class a new mixed metaclass.(_It’s so ridiculous, I don’t want the matryoshka phenomenon here, I will find a better solution slowly..._).

tricks:How to get the metaclass of the class? Call cls.__class__ to get the metaclass he is based on. Remember, classes in Python are themselves objects. His creation is also controlled.

About fastapi

Well, the core design ideas of the first version of Frodo have been introduced. In the previous narrative, I rarely mentioned fastapi, because the asynchronous web itself is not related to the framework, this set of content is replaced by sanic, aiohttp, tornado and even Django are the same, but the specific implementation means are different, for example, the asynchronization of Django is based on the channel designed by him.

But fastapi also has his special features. The design ideas are compatible and I think a lot. In the development, I strongly recommend the use of several places:

  • The design of the data schema schema, with the type checking of pydantic, makes python, a dynamic language, more readable, easier to debug, and more grammatically standardized. I believe this is the future trend.
  • The design of Depends, we had thought about encapsulating the reused logic into classes, functions, and decorators, but fastapi directly made an argument on the parameters, which surprised me. He replaced the context and more on the parameters. Parameters, form parameters, authentication parameters, etc.
  • Compatible with synchronous writing, including WSGI, using the synchronous technology library with fastapi is completely okay, he allows the existence of synchronization functions, the reason is that he is based on ASGI that he is a superset of WSGI, should Compatible with two writing methods.
  • Supporting swagger-doc, back-end benefits, so that you don't need to spend time learning OpenAPI syntax to make debugging platforms and documents that can be used and understood by front-end and back-end staff, saving time and effort.

Frodo's three introductions are now complete. Projects that rely on gaps outside school hours and research time are inevitably full of loopholes. But after a month of battle, the first version was finally completed. The future goal is the sea of ​​stars. The addition of new languages, the split of multiple services, and the deployment of virtualization will all take time to test. Work hard~!