当前位置: 首页 > news >正文

装饰公司响应式网站建设案例免费的大数据分析平台

装饰公司响应式网站建设案例,免费的大数据分析平台,网站建设的目标及功能定位,网站建设开源程序我们介绍的 NV-Embed-v2 是一种通用嵌入模型,它在大规模文本嵌入基准(MTEB 基准)(截至 2024 年 8 月 30 日)的 56 项文本嵌入任务中以 72.31 的高分排名第一。此外,它还在检索子类别中排名第一(…

在这里插入图片描述

我们介绍的 NV-Embed-v2 是一种通用嵌入模型,它在大规模文本嵌入基准(MTEB 基准)(截至 2024 年 8 月 30 日)的 56 项文本嵌入任务中以 72.31 的高分排名第一。此外,它还在检索子类别中排名第一(在 15 项任务中获得 62.65 分),这对 RAG 技术的发展至关重要。

NV-Embed-v2 采用了多项新设计,包括让 LLM 关注潜在向量,以获得更好的池化嵌入输出,并展示了一种两阶段指令调整方法,以提高检索和非检索任务的准确性。此外,NV-Embed-v2 还采用了一种新颖的硬阴性挖掘方法,该方法考虑了正相关性得分,能更好地去除假阴性。

有关更多技术细节,请参阅我们的论文: NV-Embed:将 LLM 训练为通用嵌入模型的改进技术。

型号详情

  • 仅用于解码器的基本 LLM:Mistral-7B-v0.1
  • 池类型: Latent-Attention
  • 嵌入尺寸: 4096

如何使用

所需软件包

如果遇到问题,请尝试安装以下 python 软件包

pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers==4.42.4
pip install flash-attn==2.2.0
pip install sentence-transformers==2.7.0

以下是如何使用 Huggingface-transformer 和 Sentence-transformer 对查询和段落进行编码的示例。

HuggingFace Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = ['are judo throws allowed in wrestling?', 'how to become a radiology technician in michigan?']# No instruction needed for retrieval passages
passage_prefix = ""
passages = ["Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.","Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]# load model with tokenizer
model = AutoModel.from_pretrained('nvidia/NV-Embed-v2', trust_remote_code=True)# get the embeddings
max_length = 32768
query_embeddings = model.encode(queries, instruction=query_prefix, max_length=max_length)
passage_embeddings = model.encode(passages, instruction=passage_prefix, max_length=max_length)# normalize embeddings
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)# get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)
# batch_size=2
# query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length, num_workers=32, return_numpy=True)
# passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length, num_workers=32, return_numpy=True)scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())
# [[87.42693328857422, 0.46283677220344543], [0.965264618396759, 86.03721618652344]]

Sentence-Transformers

import torch
from sentence_transformers import SentenceTransformer# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = ['are judo throws allowed in wrestling?', 'how to become a radiology technician in michigan?']# No instruction needed for retrieval passages
passages = ["Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.","Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]# load model with tokenizer
model = SentenceTransformer('nvidia/NV-Embed-v2', trust_remote_code=True)
model.max_seq_length = 32768
model.tokenizer.padding_side="right"def add_eos(input_examples):input_examples = [input_example + model.tokenizer.eos_token for input_example in input_examples]return input_examples# get the embeddings
batch_size = 2
query_embeddings = model.encode(add_eos(queries), batch_size=batch_size, prompt=query_prefix, normalize_embeddings=True)
passage_embeddings = model.encode(add_eos(passages), batch_size=batch_size, normalize_embeddings=True)scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())

MTEB 基准的指令模板

对于检索、STS 和摘要的 MTEB 子任务,请使用 instructions.json 中的指令前缀模板。 对于分类、聚类和重排,请使用 NV-Embed 论文表 7 中提供的说明。 7 中提供的说明。

instructions.json

{"ClimateFEVER":{"query": "Given a claim about climate change, retrieve documents that support or refute the claim","corpus": ""},"HotpotQA":{"query": "Given a multi-hop question, retrieve documents that can help answer the question","corpus": ""},"FEVER":{"query": "Given a claim, retrieve documents that support or refute the claim","corpus": ""},"MSMARCO":{"query": "Given a web search query, retrieve relevant passages that answer the query","corpus": ""},"DBPedia":{"query": "Given a query, retrieve relevant entity descriptions from DBPedia","corpus": ""},"NQ":{"query": "Given a question, retrieve passages that answer the question","corpus": ""},"QuoraRetrieval":{"query": "Given a question, retrieve questions that are semantically equivalent to the given question","corpus": "Given a question, retrieve questions that are semantically equivalent to the given question"},"SCIDOCS":{"query": "Given a scientific paper title, retrieve paper abstracts that are cited by the given paper","corpus": ""},"TRECCOVID":{"query": "Given a query on COVID-19, retrieve documents that answer the query","corpus": ""},"Touche2020":{"query": "Given a question, retrieve passages that answer the question","corpus": ""},"SciFact":{"query": "Given a scientific claim, retrieve documents that support or refute the claim","corpus": ""},"NFCorpus":{"query": "Given a question, retrieve relevant documents that answer the question","corpus": ""},"ArguAna":{"query": "Given a claim, retrieve documents that support or refute the claim","corpus": ""},"FiQA2018":{"query": "Given a financial question, retrieve relevant passages that answer the query","corpus": ""},"STS":{"text": "Retrieve semantically similar text"},"SUMM":{"text": "Given a news summary, retrieve other semantically similar summaries"}
}

如何启用多 GPU(注意,这是 HuggingFace Transformers的情况)

from transformers import AutoModel
from torch.nn import DataParallelembedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v2")
for module_key, module in embedding_model._modules.items():embedding_model._modules[module_key] = DataParallel(module)
http://www.ds6.com.cn/news/59785.html

相关文章:

  • 免费视频素材下载的网站站长之家点击进入
  • 信息分类网站好建吗长沙弧度seo
  • 查看一个网站的源代码做评价seo 什么意思
  • 太原专业设计网页公司上海seo服务外包公司
  • 西安网站建设哪家公司好宁波seo搜索引擎优化公司
  • 昆明市住房和城乡建设局网站推广赚佣金的软件排名
  • wordpress旅游类网站模板广州抖音seo
  • 网站建设维护学什么百度seo推广
  • 做视频网站注意什么问题河北网站建设制作
  • 深圳做网站建设的公司深圳互联网公司排行榜
  • 十句经典广告语想找搜索引擎优化
  • 在线图片编辑免费版seo刷排名公司
  • 免费建设网站bt搜索引擎下载
  • 中小企业如何建设网站360站长工具
  • 中国建设银行官方网站汇率最近国际新闻大事20条
  • 申请建设工作网站的函资阳市网站seo
  • 安吉做企业网站深圳知名网络优化公司
  • 网站快速排名方法中国软文网官网
  • 广州做网站信科网络seo的五个步骤
  • 网站建设价格多少企业网络搭建
  • 网站域名设计负面口碑营销案例
  • 网站子域名怎么设置百度指数在哪里看
  • 百度网站两两学一做心得体会营销案例100例小故事及感悟
  • 西藏做网站怎么做好网站方式推广
  • 二手交易平台网站的建设网站推广网络推广
  • 哪一个景区网站做的最成熟外贸seo网站建设
  • 在哪里做网站好软文写手兼职
  • 北京网站建设哪家公司好灰色seo关键词排名
  • 北京网站建设方案品牌公司域名查询注册信息查询
  • 集团公司网站欣赏内部优化