Make CovidQA with short code using jina2.0

Jina with shortcode(CovidQA)

Are you interested in Jina?
Jina is a Neural Search Framework based on deep learning and This is an easy-to-understand example for those new to the Jina neural search framework.

Why Jina?
:milky_way: All data types - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

:cloud_with_lightning:Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.

:stopwatch: Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.


from jina import Flow
from jina.types.document.generators import from_csv

# Open our data CSV
with open("data/news.csv") as file:
    # Create a DocumentArray from the CSV, choosing "title" as the field to encode and index
    docs = list(from_csv(file, field_resolver={"question": "text"}))

# Create a Flow. This is a pipeline that takes a DocumentArray as input, processes it, and returns a different DocumentArray as output
flow = (
    Flow(port_expose=45678, protocol="http") # Set up REST gateway for searching
    .add(
        uses="jinahub://TransformerTorchEncoder", # Add an encoder. This is the neural net that "understands" your data, downloaded from Jina Hub
        name="encoder",
    )
    .add(
        uses="jinahub://SimpleIndexer", # Add an indexer. This creates a searchable index of the encodings and metadata
        name="indexer",
    )
)

# Start the Flow
with flow:
    flow.post(on="/index", inputs=docs) # Set the Flow to index
    flow.block() # Keep the Flow open, ready for user to search

This is all the code used for this project. It looks simple, doesn’t it?
The TransformerTorchEncoder of jinahub uses the distilbert-base-nli-stsb-mean-tokens model, and you can check similar questions and answers even if you enter a typo. SimpleIndexer shows 5 top_k.
These encoders and indexers are text to text built in the jina hub, so you can import or create other encoders as needed.

Flow is how Jina streamlines and scales Executors.


Let’s take a look


Ask what you want to ask
“Do animals get coronavirus?” I’ll search for questions about this.
Then the result looks like the image below.

Also check the JSON format output.


Recommandations

This data set was posted over a year ago and may not show the desired results. It is recommended to prepare a little more data set. The news data used has about 400 to 500 questions and answers. (About 3000 lines)

Wrap up

We’ve looked at a minimal example using the Jina framework, and now it’s up to you to build what. Use jina to create more diverse search engines. It will be a good experience. There are various examples here for more information.

Demo page: Demo
Data set used: Covid-QA from kaggle

3 Likes