pytorch

OSSForks/pytorch

Fork 0

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Commit Graph

Author	SHA1	Message	Date
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Curtis Huang	c9238671ee	Use char-ngram embedding for out-of-vocabulary words Summary: Description Provide DeepText model with the functionality to load a secondary index (pre-trained char-ngram embedding, e.g. FastText) during training/test. Embeddings of out-of-vocabulary words will be computed on-the-fly during training/test by averaging the char-ngram embeddings. Approach This diff provides two custom operators to accomplish this task – ConditionalOp and IndexCharNgramGetOp. We first use IndexCharNgramGetOp to perform char-ngram index lookup and return a sparse tensor segmented by lengths for each token. The sparse tensor is then used to compute the average embedding provided by the char-ngram index. Finally, we use a ConditionalOp to replace those whose embeddings were not found in the original index during the feature apply stage. Please refer to documentations of the code for more details. Reviewed By: jamesr66a Differential Revision: D5666924 fbshipit-source-id: f76605d093154a014d5b9ebf9510de9d79874eee	2017-09-01 19:16:49 -07:00

Author

SHA1

Message

Date

Orion Reblitz-Richardson

1d5780d42c

Remove Apache headers from source.

* LICENSE file contains details, so removing from individual source files.

2018-03-27 13:10:18 -07:00

Yangqing Jia

8286ce1e3a

Re-license to Apache

Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902

2017-09-28 16:22:00 -07:00

Curtis Huang

c9238671ee

Use char-ngram embedding for out-of-vocabulary words

Summary:
**Description**

Provide DeepText model with the functionality to load a secondary index (pre-trained char-ngram embedding, e.g. FastText) during training/test.  Embeddings of out-of-vocabulary words will be computed on-the-fly during training/test by averaging the char-ngram embeddings.

**Approach**

This diff provides two custom operators to accomplish this task – ConditionalOp and IndexCharNgramGetOp.  We first use IndexCharNgramGetOp to perform char-ngram index lookup and return a sparse tensor segmented by lengths for each token.  The sparse tensor is then used to compute the average embedding provided by the char-ngram index.  Finally, we use a ConditionalOp to replace those whose embeddings were not found in the original index during the feature apply stage.  Please refer to documentations of the code for more details.

Reviewed By: jamesr66a

Differential Revision: D5666924

fbshipit-source-id: f76605d093154a014d5b9ebf9510de9d79874eee

2017-09-01 19:16:49 -07:00

3 Commits