Go to file
2024-10-02 21:11:23 +08:00
dataset/public update: README 2024-09-17 20:20:47 +08:00
intention-classify add: open set validation 2024-09-28 21:53:55 +08:00
text-difficulty/grammar add: text-difficulty/grammar 2024-10-02 21:11:23 +08:00
translate update: add metadata export of intention classify 2024-09-26 22:57:27 +08:00
translate-old/zh-en ref: use argos-translate instead 2024-09-07 23:02:50 +08:00
.gitignore add: dataset 2024-09-16 17:29:12 +08:00
LICENSE init 2024-09-01 22:17:04 +08:00
README.md update: README 2024-09-17 20:20:47 +08:00

sparkastML

This repository contains the machine learning components for the sparkast project.

The main goal of this project is to improve the search functionality of sparkast, enabling users to receive real-time answers as they type their queries.

Intention Classification

The model in the /intention-classify directory is designed to categorize user queries into predefined classes.

We use a Convolutional Neural Network (CNN) architecture combined with an Energy-based Model for open-set recognition.

This model is optimized to be lightweight, ensuring it can run on a wide range of devices, including within the browser environment.

For a detailed explanation of how it works, refer to this blog post.

Translation

Language barriers are one of the biggest obstacles to communication between civilizations. In modern times, with the development of computer science and artificial intelligence, machine translation is bridging this gap and building a modern Tower of Babel.

Unfortunately, many machine translation systems are owned by commercial companies, which seriously hinders the development of freedom and innovation.

Therefore, sparkastML is on a mission to challenge commercial machine translation. We decided to tackle the translation between Chinese and English first. These are two languages with a long history and a large number of users. Their writing methods and expression habits are very different, which brings challenges to the project.

For more details, visit this page.

Dataset

To support the development of Libre Intelligence, we have made a series of datasets publicly available. You can access them here.