doc: GitBook - No subject
This commit is contained in:
parent
4dc4dfcc5d
commit
35b84787ad
@ -12,8 +12,7 @@
|
|||||||
* [Overview](architecure/overview.md)
|
* [Overview](architecure/overview.md)
|
||||||
* [Database Structure](architecure/database-structure/README.md)
|
* [Database Structure](architecure/database-structure/README.md)
|
||||||
* [Type of Song](architecure/database-structure/type-of-song.md)
|
* [Type of Song](architecure/database-structure/type-of-song.md)
|
||||||
* [Message Queue](architecure/message-queue/README.md)
|
* [Message Queue](architecure/message-queue.md)
|
||||||
* [VideoTagsQueue](architecure/message-queue/videotagsqueue.md)
|
|
||||||
* [Artificial Intelligence](architecure/artificial-intelligence.md)
|
* [Artificial Intelligence](architecure/artificial-intelligence.md)
|
||||||
|
|
||||||
## API Doc
|
## API Doc
|
||||||
|
@ -6,11 +6,23 @@ For a **song**, it must meet the following conditions to be included in CVSA:
|
|||||||
|
|
||||||
### Category 30
|
### Category 30
|
||||||
|
|
||||||
In principle, the songs featured in CVSA must be included in a video categorized under VOCALOID·UTAU (ID 30) that is posted on Bilibili. In some special cases, this rule may not be enforced. 
|
In principle, the songs must be featured in a video that is categorized under the VOCALOID·UTAU (ID 30) category in [Bilibili](https://en.wikipedia.org/wiki/Bilibili) in order to be observed by our [automation program](../architecure/overview.md#crawler). We welcome editors to manually add songs that have not been uploaded to bilibili / categorized under this category.
|
||||||
|
|
||||||
### At Leats One Line of Chinese
|
#### NEWS
|
||||||
|
|
||||||
The lyrics of the song must contain at least one line in Chinese. This means that even if a voicebank that only supports Chinese is used, if the lyrics of the song do not contain Chinese, it will not be included in the CVSA.
|
Recently, Bilibili seems to be offlining the sub-category. This means the VOCALOID·UTAU category can no longer be entered from the frontend, and producers can no longer upload videos to this category (instead, they can only choose the parent category "Music"). 
|
||||||
|
|
||||||
|
According to our experiments, Bilibili still retains the code logic of sub-categories in the backend, and newly published songs may still be in the VOCALOID·UTAU sub-category, and the related APIs can still work normally. However, there are [reports](https://www.bilibili.com/opus/1041223385394184199) that some of the new songs have been placed under the "Music General" sub-category.\
|
||||||
|
We are still waiting for Bilibili's follow-up actions, and in the future, we may adjust the scope of our automated program's crawling.
|
||||||
|
|
||||||
|
### At Leats One Line of Chinese / Chinese Virtual Singer
|
||||||
|
|
||||||
|
The lyrics of the song must contain at least one line in Chinese. Otherwise, if the lyrics of the song do not contain Chinese, it will only be included in the CVSA only if a Chinese virtual singer has been used.
|
||||||
|
|
||||||
|
We define a **Chinese virtual singer** as follows:
|
||||||
|
|
||||||
|
1. The singer primarily uses Chinese voicebank (i.e. the most widely used voickbank for the singer is Chinese)
|
||||||
|
2. The singer is operated by a company, organization, individual or group located in Mainland China, Hong Kong, Macau or Taiwan.
|
||||||
|
|
||||||
### Using Vocal Synthesizer
|
### Using Vocal Synthesizer
|
||||||
|
|
||||||
|
@ -11,3 +11,7 @@ Located at `/filter/` under project root dir, it classifies a video in the [cate
|
|||||||
* 0: Not related to Chinese vocal synthesis
|
* 0: Not related to Chinese vocal synthesis
|
||||||
* 1: A original song with Chinese vocal synthesis
|
* 1: A original song with Chinese vocal synthesis
|
||||||
* 2: A cover/remix song with Chinese vocal synthesis
|
* 2: A cover/remix song with Chinese vocal synthesis
|
||||||
|
|
||||||
|
### The Predictor
|
||||||
|
|
||||||
|
Located at `/pred/`under the project root dir, it predicts the future views of a video. This is a regression model that takes historical view trends of a video, other contextual information (such as the current time), and future time points to be predicted as feature inputs, and outputs the increment in the video's view count from "now" to the specified future time point.
|
||||||
|
@ -8,4 +8,6 @@ All public data of CVSA (excluding users' personal data) is stored in a database
|
|||||||
* bili\_user: stores snapshots of Bilibili user information
|
* bili\_user: stores snapshots of Bilibili user information
|
||||||
* all\_data: metadata of all videos in [category 30](../../about/scope-of-inclusion.md#category-30).
|
* all\_data: metadata of all videos in [category 30](../../about/scope-of-inclusion.md#category-30).
|
||||||
* labelling\_result: Contains label of videos in `all_data`tagged by our [AI system](../artificial-intelligence.md#the-filter).
|
* labelling\_result: Contains label of videos in `all_data`tagged by our [AI system](../artificial-intelligence.md#the-filter).
|
||||||
|
* video\_snapshot: Statistical data of videos that are fetched regularly (e.g., number of views, etc.), we call this fetch process as "snapshot".
|
||||||
|
* snapshot\_schedule: The scheduling information for video snapshots.
|
||||||
|
|
||||||
|
7
doc/en/architecure/message-queue.md
Normal file
7
doc/en/architecure/message-queue.md
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
# Message Queue
|
||||||
|
|
||||||
|
We rely on message queues to manage the various tasks that [the cralwer ](overview.md#crawler)needs to perform.
|
||||||
|
|
||||||
|
### Code Path
|
||||||
|
|
||||||
|
Currently, the code related to message queues are located at `lib/mq` and `src`.
|
@ -1,2 +0,0 @@
|
|||||||
# Message Queue
|
|
||||||
|
|
@ -1,11 +0,0 @@
|
|||||||
# VideoTagsQueue
|
|
||||||
|
|
||||||
### Jobs
|
|
||||||
|
|
||||||
The VideoTagsQueue contains two jobs: `getVideoTags`and `getVideosTags`. The former is used to fetch the tags of a video, and the latter is responsible for scheduling the former.
|
|
||||||
|
|
||||||
### Return value
|
|
||||||
|
|
||||||
The return values across two jobs follows the following table:
|
|
||||||
|
|
||||||
<table><thead><tr><th width="168">Return Value</th><th>Description</th></tr></thead><tbody><tr><td>0</td><td>In <code>getVideoTags</code>: the tags was successfully fetched<br>In <code>getVideosTags</code>: all null-tags videos have a corresponding job successfully queued.</td></tr><tr><td>1</td><td>Used in <code>getVideoTags</code>: occured <code>fetch</code>error during the job</td></tr><tr><td>2</td><td>Used in <code>getVideoTags</code>: we've reached the rate limit set in NetScheduler</td></tr><tr><td>3</td><td>Used in <code>getVideoTags</code>: did't provide aid in the job data</td></tr><tr><td>4</td><td>Used in<code>getVideosTags</code>: There's no video with NULL as `tags`</td></tr><tr><td>1xx</td><td>Used in<code>getVideosTags</code>: the number of tasks in the queue has exceeded the limit, thus <code>getVideosTags</code> stops adding tasks. <code>xx</code> is the number of jobs added to the queue during execution.</td></tr></tbody></table>
|
|
@ -1,5 +1,4 @@
|
|||||||
---
|
---
|
||||||
icon: globe-pointer
|
|
||||||
layout:
|
layout:
|
||||||
title:
|
title:
|
||||||
visible: true
|
visible: true
|
||||||
@ -15,4 +14,14 @@ layout:
|
|||||||
|
|
||||||
# Overview
|
# Overview
|
||||||
|
|
||||||
|
The whole CVSA system can be sperate into three different parts:
|
||||||
|
|
||||||
|
* Frontend
|
||||||
|
* API
|
||||||
|
* Crawler
|
||||||
|
|
||||||
|
The frontend is driven by [Astro](https://astro.build/) and is used to display the final CVSA page. The API is driven by [Hono](https://hono.dev) and is used to query the database and provide REST/GraphQL APIs that can be called by out website, applications, or third parties. The crawler is our automatic data collector, used to automatically collect new songs from bilibili, track their statistics, etc.
|
||||||
|
|
||||||
|
### Crawler
|
||||||
|
|
||||||
Automation is the biggest highlight of CVSA's technical design. To achieve this, we use a message queue powered by [BullMQ](https://bullmq.io/) to concurrently process various tasks in the data collection life cycle.
|
Automation is the biggest highlight of CVSA's technical design. To achieve this, we use a message queue powered by [BullMQ](https://bullmq.io/) to concurrently process various tasks in the data collection life cycle.
|
||||||
|
Loading…
Reference in New Issue
Block a user