Download files using pattern python

Download files using pattern python

download files using pattern python

Requests is a versatile HTTP library in python with various applications. One of its applications is to download a file from web using the file URL. ftps = FTP_TLS(host) www.cronistalascolonias.com.ar(user = "www.cronistalascolonias.com.ar", passwd ="password") ftps.​cwd('sdir') # Change Directory on FTP Server files. Pattern matching for python. Pre-processors. Project description; Project details​; Release history; Download files Works in Python , + and pypy. download files using pattern python

This is the eighth article in my series of articles on Python for NLP. In my previous article, I explained how Python's TextBlob library can be used to perform a variety of NLP tasks ranging from tokenization to POS tagging, and text classification to sentiment analysis. In this article, we will explore Python's Pattern library, which is another extremely useful Natural Language Processing library.

The Pattern library is a multipurpose library capable of handling the following tasks:

  • Natural Language Processing: Performing tasks such as tokenization, stemming, POS tagging, sentiment analysis, etc.
  • Data Mining: It contains APIs to mine data from sites like Twitter, Facebook, Wikipedia, etc.
  • Machine Learning: Contains machine learning models such as SVM, KNN, and perceptron, which can be used for classification, regression, and clustering tasks.

In this article, we will see the first two applications of the Pattern library from the above list. We will explore the use of the Pattern Library for NLP by performing tasks such as tokenization, stemming and sentiment analysis. We will also see how the Pattern library can be used for web mining.

Installing the Library

To install the library, you can use the following pip command:

Otherwise if you are using the Anaconda distribution of Python, you can use the following Anaconda command to download the library:

Pattern Library Functions for NLP

In this section, we will see some of the NLP applications of the Pattern Library.

Tokenizing, POS Tagging, and Chunking

In the NLTK and spaCy libraries, we have a separate function for tokenizing, POS tagging, and finding noun phrases in text documents. On the other hand, in the Pattern library there is the all-in-one method that takes a text string as an input parameter and returns corresponding tokens in the string, along with the POS tag.

The method also tells us if a token is a noun phrase or verb phrase, or subject or object. You can also retrieve lemmatized tokens by setting parameter to . The syntax of the method along with the default values for different parameters is as follows:

Let's see the method in action:

To use the method, you have to import the module from the library. The module contains English language NLP functions. If you use the method to print the output of the method on the console, you should see the following output:

In the output, you can see the tokenized words along with their POS tag, the chunk that the tokens belong to, and the role. You can also see the lemmatized form of the tokens.

If you call the method on the object returned by the method, the output will be a list of sentences, where each sentence is a list of tokens and each token is a list of words, along with the tags associated with the words.

For instance look at the following script:

The output of the script above looks like this:

Pluralizing and Singularizing the Tokens

The and methods are used to convert singular words to plurals and vice versa, respectively.

The output looks like this:

Converting Adjective to Comparative and Superlative Degrees

You can retrieve comparative and superlative degrees of an adjective using and functions. For instance, the comparative degree of good is better and the superlative degree of good is best. Let's see this in action:

Output:

Finding N-Grams

N-Grams refer to "n" combination of words in a sentence. For instance, for the sentence "He goes to hospital", 2-grams would be (He goes), (goes to) and (to hospital). N-Grams can play a crucial role in text classification and language modeling.

In the Pattern library, the method is used to find the all the n-grams in a text string. The first parameter to the method is the text string. The number of n-grams is passed to the parameter of the method. Look at the following example:

Output:

Finding Sentiments

Sentiment refers to an opinion or feeling towards a certain thing. The Pattern library offers functionality to find sentiment from a text string.

In Pattern, the object is used to find the polarity (positivity or negativity) of a text along with its subjectivity.

Depending upon the most commonly occurring positive (good, best, excellent, etc.) and negative (bad, awful, pathetic, etc.) adjectives, a sentiment score between 1 and -1 is assigned to the text. This sentiment score is also called the polarity.

In addition to the sentiment score, subjectivity is also returned. The subjectivity value can be between 0 and 1. Subjectivity quantifies the amount of personal opinion and factual information contained in the text. The higher subjectivity means that the text contains personal opinion rather than factual information.

When you run the above script, you should see the following output:

The sentence "This is an excellent movie to watch. I really love it" has a sentiment of , which shows that it is highly positive. Similarly, the subjectivity of refers to the fact that the sentence is a personal opinion of the user.

Checking if a Statement is a Fact

The function from the Pattern library can be used to find the degree of certainty in the text string. The function returns a value between -1 to 1. For facts, the function returns a value greater than

Here is an example of it in action:

In the script above we first import the method along with the class. On the second line, we import the function. The method takes text as input and returns a tokenized form of the text, which is then passed to the class constructor. The method takes the class object and returns the modality of the sentence.

Since the text string "Paris is the capital of France" is a fact, in the output, you will see a value of 1.

Similarly, for a sentence which is not certain, the value returned by the method is around Look at the following script:

Since the string in the above example is not very certain, the modality of the above string will be

Spelling Corrections

The method can be used to find if a word is spelled correctly or not. The method returns 1 if a word is % correctly spelled. Otherwise the method returns the possible corrections for the word along with their probability of correctness.

Look at the following example:

In the script above we have a word which is incorrectly spelled. In the output, you will see possible suggestions for this word.

According to the method, there is a probability that the word is "While", similarly there is a probability of that the word is "White", and so on.

Now let's spell a word correctly:

Output:

From the output, you can see that there is a % chance that the word is spelled correctly.

Working with Numbers

The Pattern library contains functions that can be used to convert numbers in the form of text strings into their numeric counterparts and vice versa. To convert from text to numeric representation the function is used. Similarly to convert back from numbers to their corresponding text representation the function is used. Look at the following script:

Output:

In the output, you will see which is the numeric representation of text "one hundred and twenty-two". Similarly, you should see "two hundred and fifty-six point thirty-nine" which is text representation of the number

Remember, for function we have to provide the integer value that we want our number to be rounded-off to.

The function is used to get a word count estimation of the items in the list, which provides a phrase for referring to the group. If a list has similar items, the function will quantify it to "several". Two items are quantified to a "couple".

In the list, we have three apples, three bananas, and two mangoes. The output of the function for this list looks like this:

Similarly, the following example demonstrates the other word count estimations.

Output:

Pattern Library Functions for Data Mining

In the previous section, we saw some of the most commonly used functions of the Pattern library for NLP. In this section, we will see how the Pattern library can be used to perform a variety of data mining tasks.

The module of the Pattern library is used for web mining tasks.

Accessing Web Pages

The object is used to retrieve contents from the webpages. It has several methods that can be used to open a webpage, download the contents from a webpage and read a webpage.

You can directly use the method to download the HTML contents of any webpage. The following script downloads the HTML source code for the Wikipedia article on artificial intelligence.

You can also download files from webpages, for example, images using the URL method:

In the script above we first make a connection with the webpage using the method. Next, we call the method on the opened page, which returns the file extension. The file extension is appended at the end of the string "football". The open method is called to read this path and finally, the method downloads the image and writes it to the default execution path.

Finding URLs within Text

You can use the method to extract URLs from text strings. Here is an example:

In the output, you will see the URL for the Google website as shown below:

Making Asynchronous Requests for Webpages

Webpages can be very large and it can take quite a bit of time download the complete contents of the webpage, which can block a user from performing any other task on the application until the complete webpage is downloaded. However, the module of the Pattern library contains a function , which downloads contents of a webpage in a parallel manner. The method runs in the background so that the user can interact with the application while the webpage is being downloaded.

Let's take a very simple example of the method:

In the above script, we retrieve the Google search result of page 1 for the search query "artificial intelligence", you can see that while the page downloads we execute a while loop in parallel. Finally, the results retrieved by the query are printed using the attribute of the object returned by the module. Next, we extract the URLs from the search, which are then printed on the screen.

Getting Search Engine Results with APIs

The pattern library contains class which is derived by the classes that can be used to connect to call API's of different search engines and websites such as Google, Bing, Facebook, Wikipedia, Twitter, etc. The object construct accepts three parameters:

  • : The developer license key for the corresponding search engine or website
  • : Corresponds to the time difference between successive request to the server
  • : Specifies the language for the results

The method of the class is used to make a request to search engine for certain search query. The method can take the following parameters:

  • : The search string
  • The type of data you want to search, it can take three values: , and .
  • : The page from which you want to start the search
  • : The number of results per page.

The search engine classes that inherit the class along with its method are: , , , , , and .

The search query returns objects for each item. The object can then be used to retrieve the information about the searched result. The attributes of the object are , , , , , .

Now let's see a very simple example of how we can search something on Google via pattern library. Remember, to make this example work, you will have to use your developer license key for the Google API.

In the script above, we create an object of Google class. In the constructor of Google, pass your own license key to the parameter. Next, we pass the string to the method. By default, the first 10 results from the first page will be returned which are then iterated, and the url and text of each result is displayed on the screen.

The process is similar for Bing search engine, you only have to replace the class with in the script above.

Let's now search Twitter for the three latest tweets that contain the text "artificial intelligence". Execute the following script:

In the script above we first import the class from the module. Next, We iterate over the tweets returned by the class and display the text of the tweet on the console. You do not need any license key to run the above script.

Converting HTML Data to Plain Text

The method of the class returns data in the form of HTML. However, if you want to do a semantic analysis of the text, for instance, sentiment classification, you need data cleaned data without HTML tags. You can clean the data with the method. The method takes as a parameter, the HTML content returned by the method, and returns cleaned text.

Look at the following script:

In the output, you should see the cleaned text from the webpage:

www.cronistalascolonias.com.ar

It is important to remember that if you are using Python 3, you will need to call method to convert the data from byte to string format.

Parsing PDF Documments

The Pattern library contains PDF object that can be used to parse a PDF document. PDF (Portable Document Format) is a cross platform file which contains images, texts, and fonts in a stand-alone document.

Let's see how a PDF document can be parsed with the PDF object:

In the script we download a document using the function. Next, the downloaded HTML document is passed to the PDF class which finally prints it on the console.

Clearing the Cache

The results returned by the methods such as and are, by default, stored in the local cache. To clear the cache after downloading an HTML document, we can use method of the cache class, as shown below:

Conclusion

The Pattern library is one of the most useful natural language processing libraries in Python. Although it is not as well-known as spaCy or NLTK, it contains functionalities such as finding superlatives and comparatives, and fact and opinion detection which distinguishes it from the other NLP libraries.

In this article, we studied the application of the Pattern library for natural language processing, and data mining and web scraping. We saw how to perform basic NLP tasks such as tokenization, lemmatization and sentiment analysis with the Pattern library. Finally, we also saw how to use Pattern for making search engine queries, mining online tweets and cleaning HTML documents.

Programmer | Blogger | Data Science Enthusiast | PhD To Be | Arsenal FC for Life
Источник: www.cronistalascolonias.com.ar

Download files using pattern python

3 thoughts to “Download files using pattern python”

Leave a Reply

Your email address will not be published. Required fields are marked *