Beginning Docker
I’m writing this article as a means of tracking commonly used docker commands in a place where I won’t forget them. If you find it useful or have additional suggestions let me know in the comments. ...
I’m writing this article as a means of tracking commonly used docker commands in a place where I won’t forget them. If you find it useful or have additional suggestions let me know in the comments. ...
A Review of the Coursera Data Science Specialization I recently completed the 10th and final course in the Data Science Specialization offered by Coursera in conjunction with Johns Hopkins University. My background is as a computer scientist and programmer looking to learn more about statistical analysis and machine learning — I have always had an interest in data analysis and machine learning but never actually studied it. I used the Data Science Specialization acted as a starting point to learn more about the field and become familiar with typical problems and solutions that data scientists encounter in the field. This article describes my experience with the specialization and answers the question of whether or not the it is worth the time. ...
Counting n-grams is a common pre-processing step for computing sentence and word probabilities over a corpus. Thankfully, this task is embarrassingly parallel and is a natural fit for distributed processing frameworks like Cloud Dataflow. This article provides an implementation of n-gram counting using Cloud Dataflow that is able to efficiently compute n-grams in parallel over massive datasets. The Algorithm Cloud Dataflow uses a programming abstraction called PCollections which are collections of data that can be operated on in parallel (Parallel Collections). When programming for Cloud Dataflow you treat each operation as a transformation of a parallel collection that returns another parallel collection for further processing. This style of development is similar to the traditional Unix philosophy of piping the output of one command to another for further processing. ...
A common method of reducing the complexity of n-gram modeling is using the Markov Property. The Markov Property states that the probability of future states depends only on the present state, not on the sequence of events that preceded it. This concept can be elegantly implemented using a Markov Chain storing the probabilities of transitioning to a next state. ...
One of the most widely used methods natural language is n-gram modeling. This article explains what an n-gram model is, how it is computed, and what the probabilities of an n-gram model tell us. ...
Early pioneers in object-oriented programming paved the path towards using Model View Controller (MVC) for graphical user interfaces as early as 1970 and web applications have continued using the pattern to separate business logic from display. This article attempts to clarify the use of Model View Controller within web applications — giving consideration to the fact that most developers will be building their application using an existing web framework. ...
I’ve been thinking about the transition of App Engine to Python 3 and have come to the conclusion that it will never happen — App Engine will eventually be deprecated in favour of Managed VMs. Let’s break this apart to see why this is. First, consider the effort required by Google to develop App Engine. The Python runtime environment was modified to enforce the sandbox of the App Engine environment. To provide a Python 3 environment for App Engine as we know it, the Python 3 runtime would need to be modified with the same restrictions. Even imagining that this would happen for Python 3.4, the effort to upgrade to Python 3.5 would require additional effort by Google to modify the runtime. ...
A common problem with Python development for large-scale teams is sharing internal libraries. At Vendasta we’ve been solving this problem using a private PyPI installation running on Google App Engine with Python eggs and wheels being served by Google Cloud Storage. Today, we are announcing the open source version of this tool — CloudPyPI. CloudPyPI is a modification of pypiserver for running on Google App Engine. We’ve also introduced a simple user management system to allow authenticated access to your Python packages. Together, we’ve found this to be a robust tool for distributing private Python libraries internally. If this is a problem you’ve been trying to solve, give CloudPyPI a try — contributions and feature requests are always welcome. ...
View all articles in the Pipeline API Series. This article will serve as a reminder of the Pipeline UI as much for the writer as for the reader. The Pipeline UI requires the MapMeduce library to be installed. If you are not familiar with MapReduce please refer to the MapReduce API Series of articles. Once MapReduce is installed you will need to add a few indices to index.yaml to properly query for pipeline records for display in the UI. ...
View all articles in the Pipeline API Series. This article will cover fully asynchronous pipelines. The term ‘asynchronous’ is misleading here — all piplines are asynchronous in the sense that yielding a pipeline is a non-blocking operation. An asynchronous refers to a pipeline that remains in a RUN state until outside action is taken, for example, a button is clicked or a task is executed. Marking a pipeline as an asynchronous pipeline is as simple as setting the async class property to True. ...