doc: GitBook - No subject

This commit is contained in:
alikia2x (寒寒) 2025-03-15 13:42:19 +00:00 committed by gitbook-bot
parent 4dc4dfcc5d
commit 35b84787ad
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
8 changed files with 39 additions and 19 deletions

View File

@ -12,8 +12,7 @@
* [Overview](architecure/overview.md) * [Overview](architecure/overview.md)
* [Database Structure](architecure/database-structure/README.md) * [Database Structure](architecure/database-structure/README.md)
* [Type of Song](architecure/database-structure/type-of-song.md) * [Type of Song](architecure/database-structure/type-of-song.md)
* [Message Queue](architecure/message-queue/README.md) * [Message Queue](architecure/message-queue.md)
* [VideoTagsQueue](architecure/message-queue/videotagsqueue.md)
* [Artificial Intelligence](architecure/artificial-intelligence.md) * [Artificial Intelligence](architecure/artificial-intelligence.md)
## API Doc ## API Doc

View File

@ -6,11 +6,23 @@ For a **song**, it must meet the following conditions to be included in CVSA:
### Category 30 ### Category 30
In principle, the songs featured in CVSA must be included in a video categorized under VOCALOID·UTAU (ID 30) that is posted on Bilibili. In some special cases, this rule may not be enforced.  In principle, the songs must be featured in a video that is categorized under the VOCALOID·UTAU (ID 30) category in [Bilibili](https://en.wikipedia.org/wiki/Bilibili) in order to be observed by our [automation program](../architecure/overview.md#crawler). We welcome editors to manually add songs that have not been uploaded to bilibili / categorized under this category.
### At Leats One Line of Chinese #### NEWS
The lyrics of the song must contain at least one line in Chinese. This means that even if a voicebank that only supports Chinese is used, if the lyrics of the song do not contain Chinese, it will not be included in the CVSA. Recently, Bilibili seems to be offlining the sub-category. This means the VOCALOID·UTAU category can no longer be entered from the frontend, and producers can no longer upload videos to this category (instead, they can only choose the parent category "Music"). 
According to our experiments, Bilibili still retains the code logic of sub-categories in the backend, and newly published songs may still be in the VOCALOID·UTAU sub-category, and the related APIs can still work normally. However, there are [reports](https://www.bilibili.com/opus/1041223385394184199) that some of the new songs have been placed under the "Music General" sub-category.\
We are still waiting for Bilibili's follow-up actions, and in the future, we may adjust the scope of our automated program's crawling.
### At Leats One Line of Chinese / Chinese Virtual Singer
The lyrics of the song must contain at least one line in Chinese. Otherwise, if the lyrics of the song do not contain Chinese, it will only be included in the CVSA only if a Chinese virtual singer has been used.
We define a **Chinese virtual singer** as follows:
1. The singer primarily uses Chinese voicebank (i.e. the most widely used voickbank for the singer is Chinese)
2. The singer is operated by a company, organization, individual or group located in Mainland China, Hong Kong, Macau or Taiwan.
### Using Vocal Synthesizer ### Using Vocal Synthesizer

View File

@ -11,3 +11,7 @@ Located at `/filter/` under project root dir, it classifies a video in the [cate
* 0: Not related to Chinese vocal synthesis * 0: Not related to Chinese vocal synthesis
* 1: A original song with Chinese vocal synthesis * 1: A original song with Chinese vocal synthesis
* 2: A cover/remix song with Chinese vocal synthesis * 2: A cover/remix song with Chinese vocal synthesis
### The Predictor
Located at `/pred/`under the project root dir, it predicts the future views of a video. This is a regression model that takes historical view trends of a video, other contextual information (such as the current time), and future time points to be predicted as feature inputs, and outputs the increment in the video's view count from "now" to the specified future time point.

View File

@ -8,4 +8,6 @@ All public data of CVSA (excluding users' personal data) is stored in a database
* bili\_user: stores snapshots of Bilibili user information * bili\_user: stores snapshots of Bilibili user information
* all\_data: metadata of all videos in [category 30](../../about/scope-of-inclusion.md#category-30). * all\_data: metadata of all videos in [category 30](../../about/scope-of-inclusion.md#category-30).
* labelling\_result: Contains label of videos in `all_data`tagged by our [AI system](../artificial-intelligence.md#the-filter). * labelling\_result: Contains label of videos in `all_data`tagged by our [AI system](../artificial-intelligence.md#the-filter).
* video\_snapshot: Statistical data of videos that are fetched regularly (e.g., number of views, etc.), we call this fetch process as "snapshot".
* snapshot\_schedule: The scheduling information for video snapshots.

View File

@ -0,0 +1,7 @@
# Message Queue
We rely on message queues to manage the various tasks that [the cralwer ](overview.md#crawler)needs to perform.
### Code Path
Currently, the code related to message queues are located at `lib/mq` and `src`.

View File

@ -1,2 +0,0 @@
# Message Queue

View File

@ -1,11 +0,0 @@
# VideoTagsQueue
### Jobs
The VideoTagsQueue contains two jobs: `getVideoTags`and `getVideosTags`. The former is used to fetch the tags of a video, and the latter is responsible for scheduling the former.
### Return value
The return values across two jobs follows the following table:
<table><thead><tr><th width="168">Return Value</th><th>Description</th></tr></thead><tbody><tr><td>0</td><td>In <code>getVideoTags</code>: the tags was successfully fetched<br>In <code>getVideosTags</code>: all null-tags videos have a corresponding job successfully queued.</td></tr><tr><td>1</td><td>Used in <code>getVideoTags</code>: occured <code>fetch</code>error during the job</td></tr><tr><td>2</td><td>Used in <code>getVideoTags</code>: we've reached the rate limit set in NetScheduler</td></tr><tr><td>3</td><td>Used in <code>getVideoTags</code>: did't provide aid in the job data</td></tr><tr><td>4</td><td>Used in<code>getVideosTags</code>: There's no video with NULL as `tags`</td></tr><tr><td>1xx</td><td>Used in<code>getVideosTags</code>: the number of tasks in the queue has exceeded the limit, thus <code>getVideosTags</code> stops adding tasks. <code>xx</code> is the number of jobs added to the queue during execution.</td></tr></tbody></table>

View File

@ -1,5 +1,4 @@
--- ---
icon: globe-pointer
layout: layout:
title: title:
visible: true visible: true
@ -15,4 +14,14 @@ layout:
# Overview # Overview
The whole CVSA system can be sperate into three different parts:
* Frontend
* API
* Crawler
The frontend is driven by [Astro](https://astro.build/) and is used to display the final CVSA page. The API is driven by [Hono](https://hono.dev) and is used to query the database and provide REST/GraphQL APIs that can be called by out website, applications, or third parties. The crawler is our automatic data collector, used to automatically collect new songs from bilibili, track their statistics, etc.
### Crawler
Automation is the biggest highlight of CVSA's technical design. To achieve this, we use a message queue powered by [BullMQ](https://bullmq.io/) to concurrently process various tasks in the data collection life cycle. Automation is the biggest highlight of CVSA's technical design. To achieve this, we use a message queue powered by [BullMQ](https://bullmq.io/) to concurrently process various tasks in the data collection life cycle.