Python one-click access to the top 100 list movie information
Posted May 25, 2020 • 3 min read
Recently, I saw a video made by a UP master, using a visual dynamic image to list the most popular UP masters one by one. As a result, the first place is Bilibili, and the first place is the second. Nearly 10 times.
The number of fan plays at station B is also relatively large compared to other platforms, and the quality is not bad. To be honest, when I first started using Bilibili, I was just watching the drama. As a pk brother who likes to watch fan dramas, I decided to crawl some Japanese anime movies TOP100 with reptiles? I looked at it online, and Time Network has this ranking list, and the information is relatively complete.
So I decided to use a crawler to save all the movie information of the Top100 on this list as a csv file and put it locally to see if there are any classic anime movies I have missed before.
The following are the saved effects. The saved columns include movie title, director screenwriter, distribution company, more titles, ratings, first day box office, total box office. Some movies are directly displayed as empty without ratings and box office information.
Get movie ID information
This reptile project is mainly divided into three parts. In the first part we want to get the Id information of the movie, because all the information we need to save is related to this. Where do I get it from? We open the source code of this list page. We can see in the source code that the id is behind the link.
In order to narrow the scope, we found that these links are in class = top \ _nlist, we use the beautifulsoup library to extract all elements of the attribute class = top \ _nlist. Then use regular expressions to extract the id information for each page.
The first page here needs special treatment, because the second page to the tenth page are directly followed by numbers. If the first page is directly added, -1 will be reported after 404, so this page is taken out separately Extract page information. Then add all the ID information to the empty list.
Extract ratings and box office information
The ID information is obtained. Next, we use the ID information to obtain the ratings and box office information of the movie. We can see through F12 debugging. The score and box office information are in js.
The change in the request link is the ID of the movie, and the rest remains the same.
We convert the returned information to Json format by simple processing. After that, we can extract the value directly from the key value. The main information extracted here is:score, first-day box office and total box office.
Extract other movie details
Next, we need to obtain the detailed information such as the name of the corresponding movie and the director's screenplay through the ID information. This information is in the source code and can be extracted directly through regular expressions.
The premise of using regular expressions to extract information is that we need to find the laws of information. In this way, regular expression extraction is fast and accurate.
After extracting this information, we save it in the list list, the purpose of this is to prepare for the later we save as csv file.
Save as csv file
After the information on each page is obtained, we can append the information to the csv file. Each time a movie information is saved, the next movie information is saved additionally. In order to avoid garbled characters after opening the saved csv file, we need to set the encoding format to encoding = 'utf-8' format.
Through these three steps, all the anime and movie information in this Top100 ranking are all saved in the local csv file. Then we can browse these movie information more conveniently. So that we can follow up better. All code information in this article can be obtained by replying to "Anime Movie" in the background of the public account "Python Knowledge Circle".
Welcome to pay attention to the public account "Python Knowledge Circle", the public account backstage keywords to get more dry goods.
Reply to "English":Give you English 7000 word shorthand method, pro test is very effective.
Reply to "Programming":Get the latest programming information for 2019 free of charge, and earnestly get soft after finishing BAT offer.
Reply to "Make Money":Receive 36 simple and profitable small projects that earn 100 extra pockets a day.